Search

Addressing New DCI Challenges in the Cloud Era

2019-09-10
340
0

Traffic pressure facing data center interconnect (DCI): Cloud services are a shared economy in the IT industry, while data centers are the carriers of these services. Due to limited land resources and equipment room space, data centers are increasingly deployed in a distributed manner, requiring WDM technology to connect them to provide unified services. As more and more services are migrated to the cloud, DCI traffic constantly increases, presenting great challenges to DCI solution. In this intensive economic era, cloud operators continuously pursue maximum utilization of land, equipment room space, energy, and optical fiber resources. Many customers opt to rent optical fibers and cabinets, with the monthly rental of optical fibers in an Asia Pacific metro network coming to US$70 to US$85 per kilometer, while the rental of optical fibers of backbone networks even reaches US$150. Meanwhile, the rental of each cabinet can often cost between US$850 to US$1150 (including electricity and air conditioning fees). Maximizing the value of optical fibers and cabinet space is a requirement from DCI.

Continuous pursuit of DCI solutions with higher integration: Data centers, subjected as they are to high traffic, are also the subjects of technological innovation. The primary goal is to construct high-speed and non-blocking DCI networks.

On the one hand, fiber spectrum resources need to be fully utilized by expanding from the current C band to the Super C band, upgrading from 80 wavelengths to 120 wavelengths in the process. This requires technical breakthroughs in hardware design, algorithms, and system components such as chips, lasers, receivers, and amplifiers.

On the other hand, the transmission capacity of each wavelength needs to be improved. Currently, single-wavelength 100G/200G is the mainstream commercial DCI technology, while single-wavelength 400G/600G has been successfully put into commercial use. Single-wavelength 800G is expected to be put into commercial use sometime in 2020. The key to implementing high-speed transmission on a single wavelength is to improve transmission performance and deployment capabilities on physical channels using algorithms. In terms of integration, the industry favors deploying the optical layer and the electrical layer in separate chassis. The capacity of each unit (1 U) is between 1.2 Tbit/s and 2.4 Tbit/s. In the future, with the increase of the single-wavelength rate, equipment density will be doubled. Simplifying the optical layer and exploring DCI device forms with optical-electrical integration is also an area for future development.

Personnel deficiencies in DCI deployment: While existing enterprise DCI O&M personnel are proficient in information technology (IT), they are less so in professional WDM knowledge. In addition, available DCI O&M manpower is often insufficient. For example, some ISP DCI O&M teams have only a dozen personnel managing thousands of devices worldwide. Third-party personnel are usually outsourced, who in turn lack the professional WDM knowledge necessary for remote site deployment, capacity expansion, and routine onsite maintenance.

Time-consuming deployment of traditional WDM equipment: Site installation, fiber connection, configuration, commissioning, and service provisioning takes significant time. According to typical customer statistics, this can often extend to several weeks. Meanwhile, cloud services are rapidly developed and rolled out, while reconstruction and capacity expansion is a daily task. This is extremely problematicfor O&M personnel. The probability of errors in traditional manual fiber connection begins to reach 5%, and once a fiber is incorrectly connected, service availability is interrupted. Troubleshooting, cross-checking, and verification all become more time-consuming and labor-intensive. Once physical fibers have been placed correctly, logical fibers also need to be connected on the network management system (NMS) one by one.

Due to the complex and precise nature of optical systems, service rollout requires dozens of professional configuration and commissioning steps. Configuring basic communication parameters for NEs on the NMS, configuring wavelengths, commissioning optical power, and adjusting optical power flatness mutually affect sites and modules in an end-to-end WDM system. This requires professional knowledge and extensive experience. Manual commissioning is also refined and time-consuming, and often these tasks are impossible for enterprises. As a result, less complex operations are required.

Reducing specialization for faster deployment: Ideally, O&M personnel need to be free from complex and professional deployment requirements. Every step in the deployment process should be simplified and automated, including the optical layer, and unnecessary fibers should not require connection. In addition, logical fiber connections on the NMS should be automatically discovered, parameters such as wavelengths automatically matched, and professional commissioning automatically performed. In the future, fiber connections, configurations, and commissioning will be simplified and automated as much as possible in order to implement one-click deployment, shortening the deployment period from weeks to minutes and supporting quick service cloudification and frequent capacity expansion.

Slow fault identification and severe fault impact: More and more applications are running on the cloud, while cloud service providers are themselves launching hundreds of services. In such a scenario, the impact of DCI faults is a serious concern. As was recently reported, a multinational cloud service provider encountered a global multi-node fault, resulting in the failure of multiple SaaS providers to deliver their services. In the customer's IT system architecture, the DCI network, as the basis of support, is connected to DCN switches, storage servers, and IT applications upstream. They usually detect service interruption first. Once a fault occurs, various departments will immediately raise complaints. The DCI side, however, is late to detect the fault.

Need for more intelligent O&M modes: To change O&M from manual to automatic, and from passive to active, intelligent O&M is the optimal choice. Big data needs to be imported first, and ubiquitous optical sensors need to be deployed to implement second-level real-time network monitoring. In addition, powerful chip computing capabilities and decision-making algorithms need to be deployed to predict network health status. Once a fault occurs, the capabilities can be invoked with matching algorithms to determine the root alarm, thereby implementing accurate fault locating. The intelligent O&M framework can provide more applications in the future, enabling easy O&M for IT personnel.

TOP