IDC finds that the enterprise IT environment has shifted from being traditionally technology-driven to business-driven. The centricity around service demands has pushed deeper into different phases of the whole process, from production and sales to services. To support this change in the role of IT systems, the systems must meet high performance and availability requirements, and enterprises must achieve a sensible balance between IT system agility, technology continuity, and cost control. Only a unified, open architecture can meet all of these requirements. The ‘open-up’ trend has been an unstoppable force in enterprise IT system development. Openness is IT’s new normal.
Enterprises have long used UNIX servers based on RISC or EPIC architectures that impose some degree of lock-in to one vendor’s systems, which leads to high Operations and Maintenance (O&M) costs and difficulties in service innovation. Against this backdrop, enterprises are calling for a mission-critical computing rebuild. A shift to x86 platforms is underway.
Another top concern of enterprise CIOs/CTOs for building open mission-critical computing platforms is to ensure business continuity and data security. Clearly, mission-critical application platforms must provide high performance; advanced Reliability, Availability, and Serviceability (RAS); tight security; high scalability; and openness.
After in-depth studies of enterprise requirements for mission-critical computing, IDC finds that the requirements have fundamentally changed in the context of cloud computing and Big Data. Mission-critical computing requires an open platform that can rapidly grow and smoothly expand along with an enterprise, and the data management architecture must be both service-oriented and enterprise-oriented.
To meet these requirements, Huawei has launched the KunLun Mission Critical Server, which uses Intel® Xeon® E7 v3/v4 processors. KunLun has been engineered and built for mission-critical applications. It leverages a scale-up architecture to help customers build mission-critical computing platforms of high performance and reliability. These unified, open computing platforms enable enterprises to migrate and manage services in cloud and non-cloud environments, help take better advantage of ‘third platform’ applications, and accelerate digital transformation.
This article analyzes today’s new requirements for mission-critical applications and drills down into specific expectations regarding the open architecture, business innovation, and service protection mechanisms. This article also describes how Huawei’s KunLun Mission Critical Server has been engineered to meet mission-critical requirements and how it helps unlock the power of business innovation.
Mission-Critical Computing Becomes the Market Mainstream
With the support of technologies such as cloud computing and Big Data, companies will introduce more intelligent and internet-based products and services. Every business will be an IT business and IT’s genes will be rooted in every business. For example, manufacturing companies will accelerate the transition to service-oriented manufacturing with the help of Internet products. In general, enterprises will roll out a steady stream of value-added services to maximize business value potential. In addition, Big Data and faster feedback of user needs will shorten the distance between manufacturing plants and consumers, making C2B a new normal. Through the analysis of customer preferences, habits, and usage frequency, companies can optimize product design, improve product quality, adjust marketing strategies, and enhance the level of after-sales service. The rapid development of Internet+ will shatter the boundaries of industries, and cross-domain competition and integration will be more common. Disruption and game-changing newcomers will be another new normal.
IDC believes that the transition of IT from a support system to a production system will bring IT systems to the foreground in an enterprise. In the future, every business will become a technology company.
The accelerated deployment of ‘third platform’ technologies, such as cloud computing and Big Data, has resulted in a stream of new applications, services, and business models. Users are becoming more and more comfortable with the use of IT, so technology thresholds are lower, but this raises the bar for user experience. As a result, enterprise IT infrastructure must accommodate more customers and more applications, have deeper insights into customer needs, respond faster to market changes, and provide the data and decision support that are essential to business innovation and transformation. At the same time, CIOs must strike a balance between business support capabilities and cost control.
With improvements in x86 chip performance and the maturity of distributed computing architectures, x86 servers have replaced many low-end and mid-range non-x86 UNIX servers. In 2010, x86-based servers accounted for about 58 percent of the overall server market in China, then climbed to 89 percent in 2015. The open-architecture computing platform represented by the x86 architecture will continue to maintain strong market momentum for the foreseeable future.
Figure 1 shows the market scale of x86 servers with four or more sockets compared to non-x86 RISC/EPIC servers in China in 2015 and 2014. The data shows a year-on-year increase of 21.1 percent in the market share of high-end x86 servers in 2015, while the market size of RISC/EPIC servers shrank by 19.6 percent.
Figure 1: Open-architecture computing becomes the market mainstream
(Source: IDC, China 2016 x86 Server Market Research Report)
- x86 Server Challenges in Mission-Critical Computing
Mission-critical services refer to applications supporting critical, revenue-generating business processes, including Business Intelligence (BI) analytics, business processes (such as transaction processing and Enterprise Resource Planning [ERP]), and core databases. These core applications have zero tolerance for interruptions, especially in sectors such as government, defense, security, telecommunications, finance, transportation, and medical care.
Standard x86 architecture servers have achieved the necessary reliability and performance for mission-critical applications mostly through scale-out and application-level control. In fact, IDC finds that the current replacement of non-x86 servers by x86 servers is still mainly in the low-end and mid-range market, which is basically implemented through scale-out — deploying a large number of two-socket to four-socket x86 servers to improve overall system performance and reliability.
At the single-node level, x86 servers differ from traditional UNIX servers in performance and RAS and require features such as load balancing, HA, and data backup to be implemented at the software or application level. x86 servers still have limitations in compute-intensive applications such as Online Transaction Processing (OLTP), service processing, and Online Analytical Processing (OLAP). For these reasons, many enterprises are still wary about using x86 servers for business-critical applications.
IT system operating costs are also an issue for traditional x86 servers. Power consumption, IT system efficiency, and computing density are of paramount importance to the overall operation efficiency and cost control of data centers. However, the scale-out architecture with large numbers of x86 servers clearly does not have the advantages of efficiency or computing density. The x86 scale-out requires more racks, bandwidth, and power support in addition to needing a large, technical O&M team. These factors make cost control all the more difficult.
After in-depth study of data centers and talking with customers, IDC found that the enormous volume of x86 servers deployed has led to a rapid increase in the overall number of enterprise IT devices which, in turn, caused continuously increasing data center size and energy consumption, and an ever-increasing management and maintenance burden. ‘Data center sprawl’ has put enterprise IT system OAM and operation cost control under tremendous pressure.
- Value of Scale-up Mission-Critical Servers
From 2012 to 2015, in a global commercial value study project, IDC tracked customers who selected scale-up servers for some mission-critical workloads. The study involved qualitative and quantitative metrics, including IT infrastructure and data center costs, IT staff time requirements, impact on unplanned downtime, and support for business operations.
IDC found that enterprise users are more likely to adopt the scale-up architecture or converged infrastructure solution for workloads such as business processing, OLTP, BI, and in-memory computing because these architectures are more integrated, computationally dense, and easier to manage.
As shown in Figure 2 and Table 1, the IDC study found that the scale-up architecture has benefits in typical business-critical computing scenarios. Enterprises that choose scale-up architecture servers based on application scenarios and workload types can effectively reduce unplanned system downtime and improve system operational efficiency despite the high cost of a single server. These advantages help reduce IT system Total Cost of Ownership (TCO) by up to 30 percent.
Figure 2: Comparison of annual costs of scale-out and scale-up architectures
(Source: IDC, 2016)
Table 1: Comparison of annual costs of scale-out and scale-up architectures
(Source: IDC, 2016)
Today’s enterprise choices for business-critical computing solutions are mainly based on core application support capabilities, IT system flexibility, and TCO. As x86 architecture servers deliver higher performance and better RAS features for mission-critical computing and its application scenarios, the benefits of an open, scale-up architecture will make a big difference in the business-critical computing space.
Huawei KunLun Mission Critical Computing Solution
KunLun is a Huawei x86-based mission-critical server line that includes three products: KunLun 9008, KunLun 9016, and KunLun 9032. These products provide scale-up computing platform options of 8 to 32 CPUs and 6 TB to 32 TB of memory for processing core applications. To meet the requirements of mission-critical applications, KunLun offers high performance, high reliability, an open ecosystem, and good alignment with typical application scenarios.
Figure 3: Huawei KunLun flat view (starting from the left: 9008, 9016, and 9032 front and rear views)
The KunLun series supports 32-CPU high-speed interconnect and nanosecond-level transmission latency through the Huawei proprietary Node Controller (NC) chip. KunLun delivers 40 percent higher performance than traditional UNIX servers. KunLun also supports physical and logical partitioning functions, delivering usage experience rivaling that of traditional high-end UNIX servers and flexibly meeting the workload demands for computing resources:
• High-speed CPU interconnect enabled by the NC chip: This x86 processor interconnect chip implements a full mesh architecture and supports four Quick Path Interconnect (QPI) links in the downlink and four Network Interface (NI) ports in the uplink. The NC chip has pushed beyond the limits of industry-standard x86 8-socket interconnects to vertically interconnect 32 Intel Xeon E7 v3/v4 processors with high speeds (with 64 CPU interconnections). The nanosecond-level data transmission latency represents a leap forward compared with the IP or InfiniBand (IB) interconnects used in scale-out architectures and enables faster response and higher OLTP efficiency.
• 8S to 32S, single-node, scale-up computing resources: Specifically, KunLun 9032 supports up to 768 cores; 1,536 threads; and 32 TB of memory on a single server. The flexible and powerful scale-up capabilities of KunLun cater to the demands of growing services for computing resources and drive down the initial hardware purchase cost.
• Industry-leading computing performance: Huawei KunLun leads the pack in performance benchmark tests such as SPEC CPU2006 (floating point and integer computing capability test) and SPEC JBB2015 (MultiJVM performance test). KunLun delivers tpmC (database OLTP performance) of over 10 million with linearity up to 1.97 times, meeting the requirements of latency-sensitive and bandwidth-intensive workloads for computing performance.
• Flexible physical and logical partitioning: Huawei KunLun is currently the industry’s only x86 mission-critical computing platform that supports both physical and logical partitioning. The partitioning features maximize resource utilization and extend the usage experience of traditional mission-critical environments to reduce deployment complexity.
Powered by the Intel Xeon E7 v3/v4 processor family, KunLun leverages Huawei’s innovative RAS 2.0 technology to transform reactive troubleshooting into proactive fault management. These servers provide the most comprehensive RAS features for x86 mission-critical computing platforms to ensure business continuity. RAS 2.0 is a package of code built into the OS, BIOS, and Baseboard Management Controller (BMC) to enable comprehensive collection of module health information and real-time online diagnostics without relying on the OS, pinpointing where a problem occurs and fixing the fault quickly and precisely. In addition, RAS 2.0 is able to identify lurking problems and intelligently divert tasks running on a potentially problematic module to other resources, thereby minimizing downtime while keeping the production jobs up and running. The core design concepts of RAS 2.0 include the following:
• 100 percent modular design and tool-free maintenance without opening the chassis. This maximizes maintenance efficiency.
• Proactive Failure Analysis Engine (PFAE), which proactively generates alerts about potential faults.
• Hot swap of core components such as CPUs and memory modules. The components can be replaced without shutting down the server, maximizing server uptime.
• Multilayer, fault-tolerant architecture with fault-tolerant chips, firmware, and OS. The fully redundant architecture eliminates single points of failure.
Huawei believes that previous RAS features mainly solve the challenges of system downtime caused by highly vulnerable parts such as hard drives, I/O, power supply, and heat dissipation components. However, according to Huawei’s O&M statistics, failure of core computing components, such as DIMMs, lead to a system crash ratio of up to 30 percent. Additionally, such faults are hard to locate precisely and consequently result in low on-site O&M efficiency, which compounds the losses incurred by unexpected system crashes. RAS 2.0 focuses on such situations and especially tackles the reliability issues related to core components such as the Node Controller (NC), CPU, and DIMMs.
RAS 2.0 features include the following:
• NC interconnect chip-related features: This chip enables link-level fault tolerance and error recovery, full link redundancy, and online support for NI cable replacement.
• CPU-related features: First, RAS 2.0 provides complete cache protection and error detection and recovery mechanisms for all core/non-core errors; the mechanisms can detect and recover more than 95 percent of transient errors and soft errors. Regarding persistent failures, RAS 2.0 provides an isolation mechanism to isolate the faults before recovering them. Second, an Enterprise Manager Configuration Assistant (EMCA) recovery mechanism combined with the OS improves the ability to recover uncorrectable errors by more than a factor of two. In addition, RAS 2.0 supports scale-up capabilities that grow with the service.
• Memory-related features: RAS 2.0 provides memory failure detection and recovery mechanisms covering the hardware, BIOS, BMC management software, and OS. RAS 2.0 also offers memory-level error correction capabilities to minimize system crashes caused by memory failures. Further, memory can be added while the servers are in operation.
Table 2: Key RAS features of the Huawei KunLun server
- Open Ecosystem Supports KunLun
Huawei partners with leading global companies to foster an open, collaborative, and comprehensive industry chain. With this third-party support, the company offers end-to-end solutions that improve mission-critical business economics and increase enterprises’ ROI in IT systems. This ecosystem offers:
• Complete and mature industrial chain: The servers are compatible with mainstream databases (such as Oracle DB, IBM DB2, SQL Server, and SAP HANA), middleware, and OSs (such as RedHat Linux, SUSE Linux, and Windows Server). Huawei collaborates with partners to promote the development of the industry chain to better meet core enterprise requirements.
• Comprehensive solution capability: Huawei has a professional solution team that provides one-stop services ranging from consulting and planning to after-sales O&M. The company’s extensive experience with migrating UNIX servers helps enterprises accelerate their transformation to x86 mission-critical computing.
• Better economics for business-critical computing: Compared with closed-architecture UNIX servers, KunLun can reduce TCO by more than 30 percent, helping customers improve overall IT ROI.
- KunLun Aligns with Typical Application Scenarios
Huawei’s KunLun uses its high performance and advanced RAS features to provide powerful support for workloads such as databases, in-memory computing, High-Performance Computing (HPC), and large-scale virtualization:
• Core databases and related applications: Core databases are the most complex and valuable part of an enterprise’s IT system, and organizations are demanding the most stringent performance and reliability for these databases. The core database is the engine driving OLTP and OLAP and is the lifeline of an enterprise. For databases, scale-up servers are better options than scale-out servers because the scale-up machines meet service performance requirements more efficiently. More importantly, they require simpler O&M.
Huawei KunLun Mission Critical Server provides a high-performance scale-up solution for large database applications. With KunLun carrying OLTP performance of over 10 million tpmC, the server meets the computing performance needs of large-scale service loads. KunLun supports multiple mainstream databases, and can provide multi-database consolidation solutions by using physical partitioning technology. These solutions enable enterprises to build a more open system to better control TCO.
• In-memory computing: Over the past 10 years, the total amount of data has increased tenfold every five years. Enterprises often need large-scale IT systems to analyze data in real time to ensure that information about products, customers, and partners is applied to business processes. OLAP applications and real-time analysis capabilities are common requirements these days.
KunLun uses the NC high-speed interconnect chip to support up to 32 CPUs and 32 TB of memory. With these resources, the system can handle some hyperscale workloads to provide real-time, in-memory computing solutions. At the same time, building on the Huawei and SAP/Oracle collaboration, Huawei has optimized a KunLun-based in-memory computing platform for typical applications in financial, manufacturing, and public security to enable linear growth of memory capacity and computing performance. This linear expansion capability enables enterprises to configure the servers according to the needs and trends of service growth, and scale up the processors and memory for optimal usage efficiency and ROI.
• HPC fat node: For applications such as Computer-Aided Engineering (CAE), gene sequencing, and scientific simulation, each HPC node needs to provide high performance and a large memory. Such nodes are called fat nodes. KunLun servers can work as HPC fat nodes to deliver high performance that cannot be provided by traditional x86 servers. At the same time, KunLun offers higher cost-effectiveness than RISC/EPIC servers.
• Virtualization consolidation: Cloud computing and virtualization help enterprises reduce hardware investment costs and operating expenses, while improving responsiveness in business support. KunLun can provide a virtualization platform based on VMware or Huawei’s FusionSphere. Users can also abstract KunLun’s physical resources into logical resources through Huawei’s multi-partitioning technology, and turn one server into several, even hundreds of isolated virtual servers. This approach puts hardware such as CPUs, memory, disks, and I/O into dynamically manageable ‘resource pools’ that increase resource utilization, simplify system management, and enable server consolidation to make IT more resilient to business changes.
KunLun allows a single server to provision multiple virtual machines, enabling users to migrate more heavy-duty services to a cloud platform. Using this cloud architecture saves machine room space and reduces investment in the cloud platform equipment.
KunLun Offers More Choices for Mission-Critical Applications
Huawei’s KunLun Mission Critical Server brings together Huawei’s most cutting-edge technologies and expertise developed in the server field over many years. Since the system’s launch in Germany in March 2016, it has been extensively tested and deployed in the financial, government, and telecommunications sectors.
In a bank case, KunLun is used as a computing platform for business-critical applications to replace a UNIX platform. As part of the banking business system, the KunLun 9032 uses physical partitioning to host database OLTP and middleware logic processing applications. This deployment method continues the method used with the UNIX servers and reduces the workload of software porting or rebuilding. KunLun has handled tens of thousands of transactions per second during peak hours, delivering more than twice the performance of the original platform. The TCO, including hardware purchase, O&M, and software licenses, is slashed by about 50 percent. At the same time, the system achieves six-nines system-level availability. In addition to the KunLun server, the bank’s solution includes Huawei’s OceanStor 18000 V3 high-end storage system. The storage system provides backup and Disaster Recovery (DR) using a two-site/three-center active-active approach through Huawei’s HyperMetro remote mirroring technology for mission-critical workloads, meeting the bank’s requirements for system performance, stability, and flexibility.
Challenges and Opportunities Facing KunLun
IDC has noted that, as the use of cloud computing, Big Data, mobile and social networks, and other technologies expands, requirements for mission-critical computing systems become increasingly stringent, demanding higher performance and more innovation and openness. Backed by strong R&D investment, Huawei’s KunLun Mission Critical Server targets the needs of this market.
At the same time, IDC notes that Huawei will face some challenges in popularizing the KunLun server. Due to the importance of an enterprise’s core services, enterprise customers often adopt mature systems that have been proven over time. For Huawei, developing well-structured, well-functioning products is only one step in a long journey. The products need to be constantly updated and improved in practical application environments to meet the needs of complex IT services. In particular, KunLun needs to be further honed, especially in customer best practices and user experience in different industries. Huawei recently announced the establishment of its Industry Consulting Solutions Department to better understand the needs of different industries and better serve its customers.
Meeting the Need for Mission-Critical Computing
The enterprise IT environment is experiencing cloud transformation, and IT architectures are developing towards an open, converged, and integrated state. At the same time, enterprises in various sectors anticipate expansion of business-centric critical systems.
Through in-depth research into users’ needs in key enterprise computing, IDC believes that user demand has undergone a disruptive change through the extensive application of ‘third platform’ technology and digital transformation. The key computing platform needs to be innovated and balanced in performance, RAS features, and openness to meet user demand. Huawei’s KunLun Mission Critical Servers provide a reliable, high-performance computing platform based on an x86 scale-up architecture, which can help customers get on track to cloud computing and Big Data. More generally, KunLun promises to match enterprises’ urgent needs for highly available, high-performing, and cost-effective mission-critical computing platforms.