Search

FlashLink: The Secret to Huawei’s High-Performance All-Flash Storage

2018-08-15

Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy, position, products, and technologies of Huawei Technologies Co., Ltd. If you need to learn more about the products and technologies of Huawei Technologies Co., Ltd., please visit our product pages or contact us.

By Wang Jiaxin, Huawei

In the digital economy era, data is growing explosively, transforming people’s understanding of data. Data used to serve business operations but has now become one of the driving forces enabling digital transformation for enterprises. As enterprises’ data centers carry an increasing number of businesses, data is used and mobilizes more and more frequently, requiring enterprises to find data systems that provide lower latency and higher service levels.

According to a survey on hundreds of data system users, about 87% of system performance problems occur in the interaction between the storage subsystem and the application database. That is to say, the response latency and concurrent access traffic of the storage subsystem determine those of the application system. High latency and small concurrent traffic of a storage subsystem have become the performance bottleneck of the entire system, which is an infuriating reality for many enterprises.

Let’s take a quick look at the history of latency requirements of storage systems needed for business expansion. In the HDD era, enterprise backup and web disk applications require relatively low storage latency, and 10 ms can help users easily cope with application demands. These days most virtual desktops require a latency of 5 ms, and with the emergence of cloud and virtualization technologies, virtual desktop offices have become the mainstream in large enterprises. Big data has changed business models, and the surge in data volume has a huge impact on enterprises’ customer relationship management (CRM) and enterprise resource planning (ERP) systems. The latency of 0.5 ms has become the ultimate goal of enterprises to ensure quality services.

The 0.5 ms latency is a predicted value that can operate under real-world heavy-workload conditions. It is neither a manipulated peak number produced in carefully controlled test parameters nor a number produced when systems are under zero workload. Storage systems are required to maintain 0.5 ms predictable latency even during service peaks so to provide users with a consistent experience.

Innovative FlashLink Technology

After continuously accumulating technical experience over the past 20 years, Huawei proudly launched the lightning-fast and rock-solid OceanStor Dorado V3 all-flash storage in 2016. Still a player in the field today, it delivers the industry-leading performance powered by the innovative FlashLink technology, and the high performance is maintained from three aspects: chip, architecture, and operating system.

OceanStor Dorado V3 adopts three intelligent chips to achieve end-to-end service acceleration and provides performance 45% higher than SAS all-flash storage. Huawei is a groundbreaking telecommunication provider in that it continuously keeps up with the latest architectural technology trends and even develops its own technologies. For example, OceanStor Dorado V3 is one of the first all-flash storage systems to use NVMe in commercial use. Further, OceanStor Dorado V3 adopts a brand-new SSD-optimized design and disk-controller collaboration technology to enable storage controllers to detect data layouts in SSDs in real time and synchronize data in controllers and SSDs. This helps reduce performance losses caused by garbage collection and ensures rapid response to data read and write I/Os. While these are just the highlights of OceanStor Dorado V3’s abundant back catalog, together they help maintain a predictable latency of 0.5 ms even under heavy workloads. The secret for such advancements is FlashLink, helping OceanStor Dorado to improve service performance by three times in comparison with traditional storage.

FlashLink overview

Innovative Disk-Controller Collaboration Ensures Predictable High Performance

The flash storage cells in an SSD can be re-written only after being erased. Generally, the basic writing unit of an SSD is a 16 KB page, and the basic erasing unit is an 8 MB block. To avoid erasing valid pages, valid pages in a block need to be migrated to another space for storage. The block space of valid pages is converted into invalid page space, after which the block can be erased at a time. The process of migrating valid pages is known as garbage collection.

Garbage collection improves the space re-utilization of an SSD, but each migration undermines the performance of the storage system. Large amounts of migrated valid data and shorter periods lasting from when each page is written to the SSD to when the page becomes invalid imposes greater impacts on the system performance.

To ensure that they are maximizing the performance of SSDs and flash storage systems, enterprises must effectively control garbage collection. Powered by proprietary SSDs and the flash operating system, OceanStor OS, Huawei’s OceanStor Dorado adopts an innovative disk-controller collaboration technology to prevent a drop in performance caused by garbage collection. By optimizing internal software algorithms, OceanStor Dorado enables storage controllers to detect the data layouts in SSDs in real time and make adjustments accordingly. This helps prevent data migration after being written to SSDs and garbage collection, ensuring predictable high performance for flash storage systems.

Large Block Sequential Writes Reduce the Frequency of Garbage Collection

Take real-time ridesharing as an example. Such services allow multiple users with similar routes to share a ride, which helps save travel fees of each user and reduce overall energy consumption.

Using this analogy, controllers of OceanStor Dorado V3 detect the data layouts in SSDs in real time and aggregate data blocks to be written to SSDs in the controller cache. The formats of the data blocks are unified, and then into a larger data block that is written to SSDs at a time to improve the overall system performance. Detailed benefits include:

The large block sequential write technology controls the frequency at which random small blocks (I/O) are written to SSDs for multiple times, which makes full use of the bandwidth of back-end SAS. RAID write penalty (extra reads and writes required during verification) has long been one of the factors hindering the performance in a storage system using RAID protection. OceanStor Dorado V3 writes data into SSDs once after data aggregation, effectively reducing the number of disk writes and the number of extra read and write requests required for verification. This feature ensures a predictable system performance when RAID 5, RAID 6, and RAID-TP are used. Even in such an unlikely scenario as three disks failing concurrently, RAID-TP, a unique technology of OceanStor Dorado V3, can still ensure that services are unaffected to. OceanStor Dorado supports global garbage collection. It monitors the system pressure in real time and controls the frequency of garbage collection in disks, mitigating the impact of garbage collection on system performance.

Large block sequential write technology

Independent Metadata Partition Controls the Frequency of Garbage Collection

In a storage system, the frequency of updating user data differs from that of updating metadata in that metadata is often updated more frequently. In scenarios where metadata and user data are written into the same partition on a disk, more garbage collections are required than in user data-only scenarios. This is because when pages of metadata become invalid, pages of the user data may remain valid. Therefore, a large amount of user data needs to be migrated during garbage collection, resulting in excessively large write amplification on the disk as well as shorter service life and lower performance of the SSD.

OceanStor Dorado V3 all-flash storage uses independent metadata partitioning. It frequently writes updates metadata to a partition and infrequently updates to a different partition in the storage system and SSD. This reduces migration of user data blocks when upgrading metadata, mitigating the impact of garbage collection on system performance. In simple terms, the independent metadata partitioning technology controls the number of garbage collections, ensuring a predictable high performance of the storage system.

Independent metadata partitioning

Prioritizing Data Read and Write I/Os

Large financial enterprises, such as big banks, often set a special counter for VIP customers to separate them from the regular customers. Similarly, when VIP counters are busy, VIP customers can jump the queue to common counters. This kind of model ensures that VIP customers enjoy the most effective services at a quicker speed than regular customers.

This kind of model is also adopted by OceanStor Dorado, which introduces an I/O priority scheduling mechanism to ensure predictably low latency of service requests. OceanStor Dorado prioritizes data read/write requests with IT resources including CPUs, memories, and concurrent disk access traffic in storage systems. Other requests such as data reconstruction, asynchronous cache flushing, and background requests within the system should compromise in the case of resource contention.

The OceanStor Dorado priority adjustment is performed synchronously in the storage controller and the SSD to ensure that the data read/write requests enjoy the top priority at all times. Other types of data I/O requests are suspended when the read/write request arrives, and resume after the read/write operation completes, guaranteeing an optimal response latency of data read and write in the storage system.

I/O priority adjustment

3x Higher Performance of All-Flash Storage

In the digital transformation era, replacing traditional storage with all-flash opens a new chapter. Individuals and enterprises no longer need to wait for response of applications in life and work.

For the financial industry, especially those in the securities field providing frequent real-time transactions, time is money. Take Hundsun in China as an example. Before it cooperated with Huawei, its traditional IT architecture supported only 60,000 transactions per second (TPS) at business peaks, insufficient of the desired 100,000 TPS. OceanStor Dorado V3 can helps enable Hundsun to process 150,000 transactions per second, and can scale for future business expansion.

In the manufacturing industry, batch processing capabilities of data warehouses are the basis for ERP business analysis. For example, BYD, the largest new energy vehicle manufacturer in China, needed at least 3.5 hours a day to batch process business requests. When the business volume was large, the system spent too long processing these requests, which caused great pressure on those at decision-making levels regarding service the next day. However, once they implemented Huawei’s OceanStor Dorado V3, the system batch processing takes only 1 hour and 12 minutes, reserving sufficient time for final decision making.

In the medical industry, the hospital information system (HIS) is the core for hospital service management. It connects to multiple processes, such as registration, diagnosis, treatment, charging, and medication. Take a well-known tertiary hospital in China as an example. In its traditional IT architecture, each patient spent 3 seconds for registration on average and should wait in three to six queues during diagnosis, lasting for at least one hour. After Huawei OceanStor Dorado is used, the registration time of each patient takes only 0.5 seconds, improving diagnosis efficiency and improving the doctor-patient relationships.

Storage leasing is one of the main services of carriers and independent service providers (ISPs). Taking ACESI Group, the largest ISP in eastern France as an example. The speed of batch VM deployment was a major concern of ISPs because it was related to new business rollout. It took 30 minutes to deploy 100 VMs using traditional storage, making rapid service development impossible. OceanStor Dorado V3 shortened the deployment time of 100 VMs to just 10 minutes. In addition, Huawei enabled ACESI to develop new platinum leasing services based on the high-performance Dorado all-flash storage, enhancing the overall competitiveness ACESI. This makes ACESI stay ahead of competitors in the industry.

In the future, Huawei’s OceanStor Dorado V3 all-flash storage will benefit more customers.

Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy, position, products, and technologies of Huawei Technologies Co., Ltd. If you need to learn more about the products and technologies of Huawei Technologies Co., Ltd., please visit our website at e.huawei.com or contact us.

TOP