Search

CloudFabric: Leading DCNs into the Intelligence Era

2020-06-16
405
0

The world is now entering into the era of the digital economy, leaving the industrial era behind with the emergence and rapid development of Information and Communications Technology (ICT). According to a survey conducted by global research and advisory firm Gartner, 75% of large enterprises have already transferred their principal strategic focus onto digital transformation. And while the most critical production elements were land and labor in the agricultural era, and capital and technology in the industrial era, data and intelligence have now replaced these industries in the digital economy. A significant amount of data has and is still being generated during digital transformation, which has become a core asset for enterprises.

However, data is not an end in and of itself. Rather, it is knowledge and wisdom that remain our true pursuits. In this context, the focus of enterprise digital transformation is how to harness the power of Artificial Intelligence (AI) to gain true wisdom from transient data, and ultimately monetize such data. Consequently, AI has become the key driving force for enterprises to reshape their business models, improve their customer experience, and redefine their futures. +AI signifies a key milestone for enterprise digital transformation in the intelligence era.

AI is driving Data Center (DC) reconstruction as Data Center Networks (DCNs) are now facing new challenges. Indeed, the intelligent upgrade of enterprise means that DCs are transition from an era defined by cloud, into the AI era. Compared with traditional DCs, cloud DCs resemble service support centers, with applications at the core and IT resources efficiently provisioned through the cloud platform. Moving from this foundation, an AI DC goes further still, evolving into a business value center, focusing on how to efficiently process data using AI.

It is unquestionable that running AI efficiently requires enormous computing power. For example, a common AI training for speech recognition involves 20E (1E = 1018) floating-point operations. Even if the world's most powerful supercomputer is used, it would still take a prolonged period of time. Such stringent requirements for AI computing power are the driving force behind the evolution of DC architecture. The emerging new architecture in the intelligence era is characterized by all-flash storage data lakes, which serve as the core, with Graphics Processing Unit (GPU) and AI diversified computing acting as the computing base. Additionally, storage and computing facilities are both undergoing drastic changes. All-flash storage, for instance, has improved storage performance 100 fold, with GPU/AI intelligent computing also having improved computing performance 100 fold.

If the running efficiency of a single server is accelerated by improving the performance of the processor and storage medium, the running efficiency of the entire DC can additionally be improved by enhancing DCN performance. Indeed, DCNs have become the impetus for unleashing DC computing power and monetizing data value. As an enabling technology in the intelligence era, AI presents both new opportunities and challenges for DCNs seeking to complete intelligent upgrades and improve deployment and Operations and Maintenance (O&M) efficiency.

CloudFabric Upgrade for the AI-Powered Intelligence Era

As the key to unlocking the gold mine that data represents, AI is essential to the success of enterprise digital transformation and intelligent upgrade. The pervasive use of AI technologies has driven disruptive changes in the mission of enterprise DCs. As AI technologies are widely used in DCs, Huawei has upgraded its CloudFabric solution, helping enterprises overcome these new challenges.

World's Highest-Density 400 GE DCN, Connecting Enterprises to the Intelligence Era

Every year, enterprise digitalization has led to an exponential increase in global data volume. The Huawei Global Industry Vision (GIV) predicts that data volumes will reach 180 ZB by 2025, a 20-fold increase in just a few years. Currently, 100 GE DCNs cannot meet the challenges posed by the data volume surges expected over the next few years. Additionally, from the perspective of mainstream AI service servers in the industry, 100 GE Network Interface Card (NIC) interfaces have become standard configurations. This indicates that the 400 GE era has arrived.

In 2019, Huawei launched the industry's first DC switch, CloudEngine 16800, which is designed specifically for the AI era. CloudEngine 16800 has upgraded the hardware switching platform and made breakthroughs in multiple fields, achieving ultra-high-speed signal transmission, superior heat dissipation, and efficient power supply based on an orthogonal architecture. It provides the industry's highest-density 48-port 400 GE line card in a single slot and the industry's largest 768-port 400 GE switching capacity. With five times the industry average switching capacity, CloudEngine 16800 easily satisfies the traffic multiplication requirements of the AI era.

Industry's First Zero Packet Loss Ethernet, Unleashing Full Computing Power Potential in the Intelligence Era

The core of the intelligence era is to introduce AI to mine data value. AI computing — characterized by deep learning — depends on the input of massive data, while data access speed directly affects computing power. Improvements in both computing and storage performance, however, further deteriorate the congestion and packet loss issues on traditional networks. In the AI era, even 0.1% packet loss will directly cause computing power to decrease by nearly 50%.

Even worse, packet loss will become more serious as the service load and distributed computing traffic increase. Moreover, because computing power of AI DCs is so expensive, insufficient computing power has become a major challenge. Even when computing power is available, it cannot be fully used due to network bottlenecks. Building a lossless DCN, therefore, has become a priority for many in the AI era.

Huawei CloudEngine 16800 — the industry's first DC switch equipped with high-performance AI chips — features an innovative iLossless algorithm that implements adaptive traffic model optimization. Intelligent and lossless DCNs built based on CloudEngine switches implement zero packet loss on the Ethernet, fully unleashing the potential of AI computing power. As verified by the Tolly Group —an independent testing and validation company — Huawei's intelligent and lossless DCN achieves 27% higher AI training efficiency than other networks in the industry when the same GPU cluster is used.

Huawei's intelligent and lossless DCN has been applied to the Atlas 900 AI training cluster, boasting the world's highest computing power. Indeed, the intelligent lossless DCN was the key to enabling Huawei to break through the performance bottleneck and set a new world record. Besides being a high-performance network oriented to AI training clusters, Huawei’s intelligent and lossless DCN is also a next generation network architecture oriented to DCs in the intelligence era.

The autonomous driving DC, which first implements full intelligence of the network before advancing towards autonomy and self-healing, is constantly growing in scale. With this, the structure is becoming increasingly complex. The Operations Expenditure (OPEX) of some DCs may even be three times higher than the Capital Expenditure (CAPEX), and the efficiency and cost of DCs face structural challenges. Administrators still need to understand service intents, perform routine network inspections, and locate and rectify faults, even if the mainstream SDN is used to implement automatic network deployment.

Huawei was the first to propose the autonomous driving network concept. Based on Software-Defined Networking (SDN) network architecture, Huawei introduced AI technologies in the End-to-End (E2E) process of planning, deployment, running, maintenance, optimization, and operation for network devices, network management and control, and upper-layer service orchestration systems. Through AI technology, networks have evolved: automated service deployment and action execution are replaced with intelligent fault self-healing, network self-optimization, network autonomy, and self-healing, free from any manual intervention.

The fully intelligent AI-powered CloudFabric solution can preliminarily implement intelligent understanding of service intents. The solution can further implement intelligent selection of the optimal network path, intelligent evaluation of change risks, intelligent fault detection, and efficient location of root causes. For 75 types of common faults, the solution can detect faults within one minute, locate them within three minutes, and rectify them within 5 minutes. The solution implements the industry's first L3 autonomous driving network in the DCN field, as certified by Tolly.

New CloudFabric, Leading DCNs to the Intelligence Era

In the year 2000, with the development of enterprise informatization strategies, real enterprise DCs were born.

In 2010, Huawei proposed the enterprise digitization strategy. As cloud computing boomed, Huawei took the lead in releasing the industry's first cloud DCN — CloudFabric — leading DCs into the cloud era and realizing the elastic scaling and automatic provisioning of IT resources.

Currently, enterprise digital transformation has entered a new phase of intelligent upgrade. As AI is beginning to be widely adopted in DCs, Huawei has upgraded the CloudFabric solution. Huawei CloudFabric is now the first to offer full intelligence for DCNs and implement the industry's first L3 autonomous driving network. In addition, Huawei CloudFabric uses the world's highest-density 400 GE CloudEngine switches with embedded AI chips and an innovative iLossless algorithm. The solution boasts the industry's only intelligent and lossless DCN with zero packet loss, which unleashes the full computing power potential of AI. It enables AI services to run more efficiently while fully monetizing the value of data, leading DCNs into the era of intelligence.

Data has become the core factor of production in driving economic growth and whoever has the leading "data infrastructure" can gain an edge. DCs have become a strategic high ground for the digital economy. To that end, enterprises are prioritizing the optimization of DCs to more effectively unlock computing power potential and data value.

TOP