Product-AIOS - ZStack

ZStack AIOS Product Introduction

ZStack AIOS is a self-developed, productized, and standardized next-generation AI infrastructure operating system. Centered around “AI,” it facilitates AI innovation through three key layers: the computing power layer, the model layer, and the operational layer. It supports seamless upgrades from cloud platforms and is compatible with all cloud infrastructure module services, product documentation, and after-sales services.

ZStack selected as a major vendor in IDC China’s Generative AI Application Development Platforms

The ZStack AIOS platform mentioned in the report is a next-generation AI Infra infrastructure platform released by ZStack in August 2024. With its advantage as an All-in-One platform, it achieves improved cost-effectiveness.

learn more >

ZStack selected as a representative vendor in IDC TechScape China Generative AI Technology

ZStack will continue to focus on the AI infrastructure field, providing more stable, efficient, and intelligent AI solutions for customers through continuous technological innovation and product optimization.

learn more >

Product Solution

Compute Layer

Model Layer

Operational Layer

Precision Scheduling Platform for Compute Power

The ZStack AIOS platform features multi-engine capabilities for bare metal, virtual machines, and containers. It achieves precise quantification of heterogeneous AI compute power through GPU partitioning, enabling management precision down to 1%, significantly reducing compute costs. Another core aspect of the compute layer is its ability to achieve unified management and dynamic scheduling of heterogeneous compute power through distributed collaborative scheduling capabilities, based on precise quantification of AI compute power. This allows for fine-grained resource reuse, further reducing compute costs.

Enterprise Pain Points

Management of multi-architecture and multi-brand GPUs is required, AI computing power allocation is not transparent, utilization is low, and maintenance costs are high.

AI computing power is scarce and expensive.

Heterogeneous AI computing power cannot interoperate, creating silos and leading to low resource utilization.

Solution Highlights

Supports the deployment of AI models on bare metal, virtual machines, and containers with multiple engines, reducing the entry barrier for AI applications.

Unified management and dynamic scheduling of heterogeneous AI computing power to achieve refined resource reuse.

Enables up to 1% quantifiable management of AI computing power, reducing AI computing costs.

GPU passthrough can achieve up to 95% of physical performance, and vGPU partitioning does not require authorization from GPU manufacturers, enhancing AI computing power utilization.

Collaboration of heterogeneous computing power, widely compatible with mainstream AI chips.

Real-time monitoring of resource utilization and self-healing of service failures, reducing operational and maintenance costs.

AI MaaS Platform for One-Stop AI Training and Deployment

On one hand, the ZStack AIOS platform offers full lifecycle management from model training, evaluation, inference to updates through a dynamic model adaptive platform, efficiently optimizing model service. On the other hand, it significantly enhances model training and deployment performance and service experience by intelligently deconstructing AI tasks, dynamically optimizing routing, distributed parallel training and deployment, and adaptive load balancing. This is combined with a compute power layer’s fine-grained collaborative scheduling platform.

Enterprise Pain Points

Difficulty in quickly deploying model training and inference.

Integrated experience for AI training and inference, application development, and operations management.

Multiple models tend to create siloed architectures, complicating operations and maintenance.

Multiple models cannot efficiently collaborate, leading to low resource utilization.

Solution Highlights

Integrated experience for AI training and inference, application development, and operations management.

Achieves model compression and performance optimization, with efficient deployment and adaptive scheduling of training and inference.

Supports generative AI, natural language processing, computer vision, machine learning, deep learning, and multimodal AI, as well as hundreds of large models.

Comprehensive Perception Self-Service Platform

The ZStack AIOS platform, with AI at its core, provides a more comprehensive and intuitive visual unified view for refined management. In terms of dynamic resource scheduling, the cross-intelligent center and multi-cluster global scheduling adaptive module can achieve cross-platform automatic scaling and scheduling functions as needed. For high availability of training and inference services, the elastic fault-tolerant self-healing module has rapid fault localization and self-healing capabilities, enabling cross-platform fault tolerance and disaster recovery. In terms of security and privacy, it integrates sensitive data detection capabilities to provide end-to-end data security assurance as an operational foundation. In system operations, it can achieve multi-tenant isolation and resource quota management, as well as training and inference metering and billing services.

Enterprise Pain Points

Lack of cross-platform metering and billing, leading to low AI adoption rates in enterprises.

Prolonged recovery time from failures and disasters, causing business interruptions.

High-level data security is required for private deployments within enterprises.

Solution Highlights

Provides on-demand billing operations across multiple intelligent computing centers, clusters, and tenants, enabling AI self-service.

Features a visual unified portal with elastic cross-domain fault tolerance, achieving a refined self-service operation system.

Integrates sensitive data detection capabilities to ensure end-to-end data security as the foundation for operations.

Advantage

Low entry barrier, | quick to get started

Fast construction

Minimum 2-node lightweight deployment

Easy to use

Easy to use
Full-link Services

One-stop experience

● AI data management

● AI training and inference

● Application development and operations

● Computing power maintenance and operations
High Cost-effectiveness

Cost-saving

Dynamic and flexible GPU partitioning High hardware utilization

Strong Performance

High-performance storage network
Security and Trustworthiness

Privacy

Localized Data Management

File-level Data Isolation

Security

High Availability and Disaster Recovery Services

Use Cases

Model Training and Optimization

We offer comprehensive services for fine-tuning models across various sectors such as film and media, healthcare, education, government, telecommunications, and intelligent computing centers. Our services include everything from computing power to the storage of industry-specific training datasets.

Model Inference

Inference usage for various AI applications is enhanced through cloud-based AI computing services to improve inference efficiency.

AI Model Application Deployment

Enable local implementation of RAG knowledge base application setup, support multiple inference service orchestration strategies and plugin integration, and quickly deploy AI applications.

Private Cloud Platform

Private Cloud Platform

ZStack ZSphere Virtualization Platform

ZStack HCI

ZStack Software-Defined Storage

Data Center Management

Edge Orchestration

Cloud-Native Platform

Database Management

Private AI

Advanced Infrastructure Platform

ZStack Cloud Platform

ZStack ZSphere Virtualization Platform

By Scenario

By Industry

Documentation&Tools

Support & Services

Training & Certification

Content

VMware-to-ZStack Case Collection

ZStack AIOS Private AI

Overview

Product Solution

Advantage

Use Cases

Talk to a ZStack

ZStack AIOS Product Introduction

Product Solution

Advantage

Use Cases

Talk to a ZStack

ZStack AIOS Private
AI