Enterprise products, solutions & services
[Shanghai, China, September 21, 2023] At HUAWEI CONNECT 2023, Yang Chaobin, Huawei's Director of the Board and President of ICT Products & Solutions, unveiled the industry-leading high-performance AI knowledge repository storage — OceanStor A800. With high-performance data infrastructure, OceanStor A800 is designed to accelerate the training and inference of industry-specific models, paving the way to the AI era.
Yang Chaobin, Huawei's Director of the Board and President of ICT Products & Solutions,The advent of digital and intelligent technologies is ushering in a wave of new foundation models and the rapid development of industry-specific models. Enterprises currently face challenges in terms of developing and applying large AI models:
First, large AI models have evolved from uni-modal to multi-modal paradigms. The training set consists mainly of massive numbers of small files like text and images. However, massive numbers of small files currently load at a speed of less than 100 MB/s, falling short of the speeds required by GPU computing. This results in a huge waste of computing resources.
Second, to better cope with network fluctuations, computing power faults, and parameter optimizations, TB-level checkpoints must be saved every two hours. However, this requires temporarily pausing GPU training, and the write performance for large files is insufficient, resulting in prolonged GPU idle time.
Third, the sluggish update of industry-specific knowledge in large AI models can result in AI hallucinations, where a large AI model may fabricate answers. This is why improving the practicability of large AI models across various industries is an urgent priority.
Huawei is therefore launching the OceanStor A800 High-Performance AI Knowledge Repository Storage with three key capabilities, to address these challenges and overcome the bottleneck in data processing efficiency for large model training and inference, thus accelerating the rollout of large AI model applications.
• Ultra-high performance: Thanks to the groundbreaking data-control separation architecture, Huawei OceanStor A800 greatly improves processing performance for small files (up to 24 million IOPS per controller enclosure). Additionally, training-data loading is 4 times more efficient than that of the next best in the industry.
• Rapid recovery: With the innovative disk-controller collaboration technology and NFS+ parallel client, Huawei OceanStor A800 delivers 500 GB/s bandwidth, meaning that TB-level checkpoint read/write can be completed in 10+ seconds, realizing resumable training from checkpoints that is three times faster than the industry's next best, and also slashing the training period.
• Robust inference: Huawei OceanStor A800 has built a high-speed intrinsic vector knowledge repository to improve the depth, precision, and timeliness of industry-specific knowledge within large AI models, thus eliminating AI hallucinations. Supported by the intelligent vector retrieval engine, OceanStor A800 boosts vector retrieval speeds to 250,000+ QPS (30% higher than that of the industry's next best), with millisecond-level inference response.
In the era of large AI models, data determines the development of AI. With no increase in available computing resources, the key to enterprise success is to deliver higher training/inference efficiency of large AI models while reducing costs. While keeping the concept of "using storage power to enhance computing power" in mind, Huawei Data Storage remains unwavering in its commitment to provide innovative AI storage products and solutions that will enable AI across a vast range of industries.
For more information about Huawei storage solutions, visit https://e.huawei.com/en/products/storage.