As digital transformation gains momentum in all industries, a variety of new digital services and network requirements are emerging. To better adapt to ever-changing service requirements, network policies need to be rapidly changed and adjusted. In the digital era, however, enterprise networks are larger and more complex than ever, with the number of wireless network users increasing exponentially and tens of thousands of wired network ports. A single-point configuration can affect the entire network, making it increasingly difficult to evaluate the impact of network changes on services. As such, it has become one of the major root causes of network faults and accidents in recent years.
According to the data provided by the Information Technology (IT) department of a well-known enterprise in China, around 30 network changes are made every month on average, with network change errors causing 43% of faults, in some cases even interrupting services entirely. In addition, many customers in the financial services industry note that their network Operations and Maintenance (O&M) is centered around network changes, and they generally believe that more than 70% of network faults are caused by said changes.
Since network changes are directly related to service operations, any successful change needs to not only ensure that new services are provisioned smoothly, but also that existing services are not affected and that no problems are introduced. O&M personnel are, therefore, increasingly cautious about making network changes, given that they need to review the solution design as well as evaluate all change impacts, before making any change. Then, after the change, they need to perform dialing tests, monitor traffic, and manually check table entries. In addition, they even need to manually monitor the newly changed network onsite for two or three consecutive days, to ensure that the change brings about the desired results.
A key challenge, then, is how to perform rapid, error-free network changes to improve configuration verification efficiency.
According to a survey of 315 network O&M experts, conducted by California-based Dimensional Research, two typical traditional O&M approaches — namely, manual check and packet mirroring, obtaining, and analysis — are used to verify configurations and locate faults during network changes. However, these approaches have obvious disadvantages.
Inaccurate Checks: Difficult to Prevent Potential Risks
Traditionally, configuration verification heavily relies on real probe packets or active traffic on networks, and network faults can be detected only after they actually occur. This makes it impossible to proactively predict faults. In addition, ping, traceroute, and tracert commands are typically used to send probe packets. But access control or security policies on the network filter the services of specific protocols or on specific ports: these cannot be fully covered by probe packets. As a result, probe packets may fail to detect faults, leading to unreliable, inaccurate verification results and an inability to prevent faults.
Incomprehensive Verification Is Unable to Cover the Entire Network
In essence, traditional manual verification is a kind of sampling analysis. However, sampling has certain unavoidable limitations and randomness, and cannot cover all network traffic. For example, manual operations can only verify single-point connectivity and are less effective at handling tens of thousands of application access relationships, which appear similar to a Cartesian product and cannot be enumerated. In addition, the traditional verification approach usually adopts packet mirroring and capture for verification objects. However, this fails to achieve 1:1 mirroring for all service packets and therefore cannot cover all access conditions of the network.
Lengthy Testing and Troubleshooting Times
Manual verification also lacks insights into path information and root causes of unreachability, and usually requires a long time to locate faults. When a device is unreachable, O&M staff need to manually check table entries and configurations of the device one by one. Such an approach may be feasible on a small network, but when it comes to midsize or large networks, with hundreds of devices and millions of table entries, this approach is no longer viable. Typically, it takes as long as 4 hours to verify a single change made in just 1 minute.
Huawei's CloudCampus Solution is already being widely used on the networks of more than 2000 large and midsize enterprises. With field-proven network innovations and practices, Huawei added the Autonomous Driving Network (ADN) solution for campus networks to its CloudCampus offerings. Using digital twin and Artificial Intelligence (AI) technologies, this new solution enables intelligent verification, helping enterprises reach levels of efficiency not previously possible with manual verification.
With intelligent verification, O&M personnel can define network verification intents on the Graphical User Interface (GUI) of iMaster NCE-Campus, Huawei's next generation autonomous driving network management and control system — including single-point reachability verification, terminal access simulation and verification, and subnet reachability verification — before determining verification rules and policies. iMaster NCE-Campus collects all necessary data, including network device configurations, forwarding table entries, and network topology information. Based on this data, it builds a digital model that can faithfully reflect the actual network architecture and forwarding behaviors, setting up digital twin mapping with the physical network. This system uses Data Plane Verification (DPV) technology to perform strict intent verification and fault remediation across the network, to efficiently verify network issues.
Network Snapshots Taken in Minutes, with Rapid Discovery of Configuration Changes
Using iMaster NCE-Campus, O&M staff can collect network device data in read-only mode and perform data plane modeling to generate network snapshots. To be specific, iMaster NCE-Campus can complete the mathematical modeling of 500 Network Elements (NEs), more than 150,000 routing entries, and 50,000 Virtual Local Area Networks (VLANs) in as little as 5 minutes, constructing network snapshots. A snapshot can be considered as a mirror image of a network at a certain time point and is the basis of intelligent verification. By comparing network snapshots at different time points, O&M personnel can easily and rapidly discover differences between devices, configuration files, interface link statuses, and Internet Protocol (IP) routes, facilitating fault locating.
100% Coverage of Mutual Access Relationships and Intuitive Insights into Subnet Interconnectivity
Based on snapshots, subnet interconnectivity verification results can be displayed in a matrix. In addition, iMaster NCE-Campus can verify packets of multiple protocols — such as Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Internet Control Message Protocol (ICMP) — and display all network reachability and multi-path connection information, giving intuitive insights into connectivity between service network segments. iMaster NCE-Campus also supports End to End (E2E) path analysis and can intuitively display verification results, including forwarding paths and network reachability. It does this with an accuracy down to protocol numbers and port numbers, significantly improving the efficiency of locating faults.
Precise Terminal Access Verification, Ensuring Secure and Reliable User Rights
Intelligent verification provides terminal access verification capability. This allows network administrators to perform real-time, precise simulation and verification, to ensure that access rights of terminals are assigned in a fine-tuned manner, guaranteeing network security and reliability.
Hundreds of Network Devices and Complex Topology and Forwarding Models Present a Challenge to Comprehensive and Accurate Mathematical Modeling
As networks continue to grow in scale, involving hundreds of network devices, and with topology becoming increasingly complex — including varied multi-layer forwarding models such as Layer 2, Layer 3, overlay, and underlay — how can comprehensive mathematical modeling be implemented?
Network Connection Modeling and 100% Mathematical Modeling of Physical Topologies
Using digital twin technology, intelligent verification abstracts complex physical networks as geometrical points, lines, and planes. Based on the collected device data, this technique builds a virtual digital twin network model that is equivalent to the physical network in terms of packet forwarding behaviors. For example, network devices, interfaces, Layer 2 and Layer 3 networks, and forwarding paths on the overlay and underlay networks are virtualized into digital equivalence classes, implementing a 100% digital twin mapping of physical networks using mathematical methods.
Packet Header Space Modeling and 100% Mathematical Modeling of Forwarding Behaviors
Packet forwarding behaviors are influenced by the packet header information. For example, IP forwarding behavior is determined by the destination IP address field, and security filtering and access control policies are closely related to 5-tuple information. Real devices can process only specific packets. Internet Protocol version 4 (IPv4) protocols serve as a good example, since they involve numerous combinations of destination IP addresses. Taking into account 5-tuple information or even more fields, the number of combinations proliferates even further. In this context, sampling checks on specific packets cannot faithfully reflect actual forwarding behaviors.
Intelligent verification has been developed based on the packet matching technology of network devices and achievements made in academic research into network verification. It considers a specific packet header as a point in an N-dimensional space. In this way, the original IP prefix forwarding table can be equivalently converted into a packet equivalence class forwarding table, comprehensively constructing a packet header space. Each equivalence class is identified by an integer, and the interface of each device records a collection of equivalence classes that can be forwarded. This allows efficient mathematical algorithms and data structures to be used, enabling each network function node to process a collection of packets in one go: for example, packets whose destination IP addresses belong to a certain network segment. This overcomes the constraints of a traditional sampling check, verifying packet headers with different combinations more efficiently and widely, achieving comprehensive verification on the entire network.
Implementing Strict and Rapid Reachability Verification with Ever-Changing, Complex Forwarding Paths
Verifying the forwarding behaviors of packets must be subject to the packet handling process and forwarding mechanism of network devices.
Intelligent verification focuses on the packet handling behaviors of NEs and uses symbolic execution technology in the formal approach to simulate packet forwarding behaviors in a symbolic manner. As shown in the following figure, the left side shows that all packet space is divided into six packet equivalence classes — {1,2,3,4,5,6} — based on all the forwarding table entries, and each equivalence class corresponds to one packet space. The right side shows a reachable tree with interface A.1 as the root. This tree lists all the possible packet forwarding paths and destinations. With an efficient graph search algorithm and a series of more complex optimization methods, network-wide loops and black holes are comprehensively detected, plus network reachability is verified in batches.
Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy, position, products, and technologies of Huawei Technologies Co., Ltd. If you need to learn more about the products and technologies of Huawei Technologies Co., Ltd., please visit our website at e.huawei.com or contact us.