Search

CloudEngine 16800 Analysis 1: AI Chip + Algorithm Reaches 100% AI Computing Power

2019-01-18
314
0

The fourth industrial revolution will be driven by Artificial Intelligence (AI) into an intelligent world where all things will sense, connect, and be intelligent. As stated in Huawei’s Global Industry Vision (GIV) 2025, there will be 180 zettabytes (180 billion terabytes) of data by 2025, with 95 percent unstructured data (voice and video) that will depend on AI for processing. AI will be adopted by 86 percent of enterprises with an increased use of AI to make decisions, reshape business models and ecosystems, as well as rebuild customer experiences. Therefore, data centers must evolve from the cloud era to the AI era.

The birth of new business models, the rapid development of the industry, and the advancement of user experience depend on AI application development such as:

• Facial recognition payment that requires intelligent recognition of hundreds of millions of images.
• Auxiliary or in-depth health diagnosis that is based on thousands of algorithm models.
• Intelligent recommendation that enables a smooth online shopping experience dependent on the intelligent computing of hundreds of servers.

After the breakthrough of the in-depth learning algorithm, data processing efficiency became the bottleneck that obstructed AI’s large-scale commercial use. To improve AI services’ efficiency, storage media evolved from Hard Disk Drives (HDDs) to Solid-State Drives (SSDs), and reduced latency by more than 100-fold. In the computing field, GPU servers or even dedicated AI chips are used to improve the data processing capability by more than 100-fold. In addition, the processing protocol evolved from TCP/IP to Remote Direct Memory Access (RDMA). Next, network communication latency becomes the key weakness, so therefore it is urgent that AI services have data center networks with zero packet loss, low latency, and high throughput.

RDMA is first carried on a lossless InfiniBand network. It lacks a complete packet loss protection mechanism and is therefore sensitive to packet loss. Because openness and simplified O&M are required, RDMA needs to be carried on the Ethernet network. The traditional Ethernet network is a best-effort network that is prone to packet loss. Therefore, RDMA is not ideal for running over Ethernet. Huawei’s CloudEngine 16800 incorporates the high-performance AI chip and iLossless algorithm to build a data center network oriented to the AI era. It offers the following innovations:

""

• Single-flow local optimization which is a similar function to intelligent traffic lights at an urban intersection that maximize efficiency through dynamic adjustment of the traffic light time in response to pedestrian and vehicle behavior. The CloudEngine 16800, with its embedded AI chip and iLossless algorithm, detects the network status in real time, and intelligently adjusts the dynamic Explicit Congestion Notification (ECN) threshold of switching queues and queue buffer. CloudEngine 16800 provides the fastest feedback at the transmit end for dynamic adjustment of the transmit rate at the source end.

• Global optimization of the entire network which is comparable to a city’s “traffic brain” that dynamically adjusts all traffic lights in response to pedestrian and vehicle behavior, to optimize traffic conditions in the entire city. The CloudEngine 16800, with its embedded AI chip and iLossless algorithm, learns and trains network-wide traffic in real time. The switch dynamically sets optimal network parameters based on diverse service traffic model characteristics, controls traffic with precision, and implements global network auto-optimization capabilities of millions of flows and application-based queues in various scenarios. This ensures that the intelligent and lossless data center network can achieve the highest throughput while packet loss is prevented. The CloudEngine 16800 overcomes the computing power limitations caused by packet loss on the traditional Ethernet, to increase the AI computing power from 50 percent to 100 percent and improve the data storage Input/Output Operations Per Second (IOPS) by 30 percent.

For an Internet enterprise’s autonomous driving AI training, the data run in one day required training for seven days. Now, AI training can be completed within four days, to accelerate the commercial use of AI applications.

CloudEngine 16800 is the industry’s first switch with an embedded high-performance AI chip that uses the innovative iLossless algorithm. It will redefine data center switches in the AI era, reach an AI computing power of 100 percent in data centers (AI incubators), and lead data centers into the AI era.

TOP