A recent paper exploring Huawei's intelligent and lossless hyper-converged Data Center Network (DCN) innovations — ACC: Automatic ECN Tuning for High-Speed Datacenter Networks — has been accepted by the Association for Computing Machinery (ACM)'s flagship annual event, the Special Interest Group on Data Communication (SIGCOMM) 2021. The acceptance demonstrates how Huawei's innovations are highly regarded by industry experts, and they have a far reaching impact that's felt around the world.
SIGCOMM is currently the leading conference in the field of international communication networks, well-known for its strict requirements on the quality of papers and extremely low acceptance rate. Papers that pass muster feature basic contributions, exert leadership, and lay a solid system background. It's no surprise then, that they are often widely cited and have great influence in their domain.
This landmark paper focuses on the Automatic Explicit Congestion Notification (ECN) technique — ACC for short — one of Huawei's major intelligent and lossless Ethernet innovations. With the adoption of Artificial Intelligence (AI) and cloud services gathering pace, data centers need to support a growing number of bandwidth-hungry and latency-sensitive applications, including big data processing, distributed storage, and High-Performance Computing (HPC). However, this rapid growth in applications will, in turn, multiply the risk of traffic congestion. Currently, ECN is the most common way to control traffic congestion, and the ECN marking threshold is key to delivering high bandwidth and low latency.
However, that technology is no longer sufficient in the digital era. The change in service traffic dynamics makes maintaining a consistent performance with static ECN settings very difficult. As a result, manual ECN tuning is needed. This approach, however, places a heavy workload on large, high-speed, and multi-tenant networks. For example, it's a major challenge to manually perform ECN tuning on a large heterogeneous DCN that is built on switches and network adapters from multiple vendors. Regardless, on a multi-tenant cloud network with an extensive range of spatiotemporal traffic models, manual ECN tuning is still required, to adapt to dynamically changing traffic loads at different points of time, but the process is time-consuming, labor-intensive, and — ultimately — ineffective.
This is where ACC steps in. Jointly designed and developed by Huawei and Nanjing University, this algorithm integrates Deep Reinforcement Learning (DRL) into ultra-high-speed data center switches for the first time ever, automatically adapting to the ever-changing traffic models by dynamically adjusting the marking threshold on each switch. The result? Simplified network operations, maximum network utilization, and minimum network latency.
To elaborate further, ACC uses the distributed multi-agent DRL technique, which facilitates scale-out of large networking. It can also flexibly adapt to dynamic traffic patterns by combining offline and online training. And the technology can be easily deployed based on the common features supported by major commercial switching chips. All of this makes ACC excel at maximizing network throughput and application performance, while minimizing transmission latency on large, high-speed DCNs.
Powered by ACC, Huawei's CloudFabric 3.0 Hyper-Converged DCN Solution ensures high throughput and low latency. The solution converges transmission of diverse traffic and unifies the siloed architectures of general-purpose computing, storage, and HPC networks into an all-Ethernet architecture, enabling lossless computing and storage networks, unleashing 100% of computing power.
The solution also enables the industry's first L3 Autonomous Driving Network (ADN), achieving full-lifecycle automated network management, network-wide intelligent Operations and Maintenance (O&M), and intelligent enterprise upgrade, in turn reducing Operating Expenditure (OPEX) by 30%.
Huawei's DCNs have been widely deployed across a wide range of industries, including the finance, government, Internet Service Provider (ISP), manufacturing, and energy sectors. Huawei will continue to focus on intelligent and lossless network research to improve network performance, fully unleash computing power, and enable the intelligent upgrade of enterprises.
Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy, position, products, and technologies of Huawei Technologies Co., Ltd. If you need to learn more about the products and technologies of Huawei Technologies Co., Ltd., please visit our website at e.huawei.com or contact us.