SJTU Builds a Unified Data Foundation for Its Data-intensive HPC Platform with OceanStor Pacific
There is an old Chinese saying, "Having broad horizons and following the trend of the times can generate the talent a country needs." One of China's oldest top universities, Shanghai Jiao Tong University was founded a century ago during a time of national crisis. It has since remained a beacon of talent cultivation for one generation after another. For many years, Shanghai Jiao Tong University has been pioneering the digital transformation of education. It has established a university-level computing platform "Counting On Me" with the digital construction concept of "inclusive + integrated".
Built in 2013, "Counting On Me" has become a leading university-level computing platform in China. It has served more than 900 scientific research teams and 180 undergraduate and graduate courses, supported the publication of over 400 papers in top academic journals such as Science and Nature, and has enabled more than 10,000 students to complete their "cloud-based practice".
"What used to take three months can now be completed in just four days by a supercomputing cluster on the 'Counting On Me' platform." The platform's accelerated computing enables not only a much faster paper submission cycle, but also much lower trial and error costs for researchers. Using accelerated computing power in the education industry, Shanghai Jiao Tong University has digitally transformed education in the changing landscape.
The core of the platform lies in the HPC industry. As the crown jewel of the IT industry, HPC is derived from the allocation of computing power and data. Computing power is the engine of digital transformation, and data is the fuel that sustains the processing and upgrading of strong computing. As the development of the digital economy carries myriad challenges, the relationship between data and computing power is changing.
In the past, data was centered around computing power. To deliver the rapid numerical solution of complex scientific and engineering problems, the main focus was building the increasingly strong computing power in the past half century. Data was only considered as a supporting facility.
Today, computing power is centered around data. The value of data is increasingly seen in numerous emerging applications, data volume expansion, and data resilience. With diverse computing power and the support of digital technologies such as AI and big data, traditional HPC has evolved into data-intensive HPC. Multiple kinds of heterogeneous computing power are tightly centered around the same datastore.
There is an old Chinese saying, "When eating fruits, one should think of the fruit trees; when drinking water, one should think of the headwaters." Shanghai Jiao Tong University has adhered to this principle. It has stayed attuned to the evolution from "computing-centric" to "data-centric" and is leading the industry with its data-intensive HPC platform for universities in China.
Huawei Storage has always been an advocate and practitioner of the HPC industry. From 2019, Shanghai Jiao Tong University has been deepening its cooperation with Huawei Storage to build the data-intensive "Counting On Me" HPC platform. At the very beginning of this journey, the two parties gained insight into five challenges in the construction of the platform:
• Rapid data growth: Every year, the platform handles 7 PB of new data. The data encompasses massive volumes of scientific research data from the university's main campus, School of Medicine, and affiliated hospitals. The biomedical field contributes to more than 40% of the data, which comprises critical bioinformation and medical imaging that simply cannot be compromised. Therefore, the primary platform construction challenge is how to build the data infrastructure to handle the mass data.
• High performance requirements: Sifting through these massive volumes of data, users run a large number of high throughput jobs and millions of small files. Under these circumstances, traditional HDD storage becomes particularly difficult. Therefore, the use of all-flash media to improve file system performance is an urgent need.
• Multi-cluster sharing: The "Counting On Me" platform provides different kinds of heterogeneous computing power, including Arm clusters, x86 clusters, and AI clusters. The platform needs to achieve full data mobility and data convergence among clusters to provide convenience to users and fully unleash the potential of data and computing power.
• Data sharing and management: The Simple Storage Server (S3) standard access interface enables students and faculty who have made scientific and technological achievements to better share and manage their research data. Two major ways to form the S3 function are the use of hardware (built-in storage system) or software (self-built Lustre storage + protocol translation). The key bottleneck is the access performance of the protocol.
• Hot/cold data migration: Data volume expansion generates high storage costs. A survey shows that nearly 50% of platform users' data is not accessed in the short term, but this data still needs to be retained for future use, especially the experimental data from scientific research teams. Therefore, it is necessary to store less frequently used cold data on more cost-effective cold storage.
Huawei drew from its wealth of experience in technological innovation and application in the HPC industry to launch the OceanStor Pacific series. The scale-out storage products serve as the unified data foundation of the "Counting On Me" platform, supporting multiple heterogeneous computing platforms in the university. Huawei OceanStor Pacific scale-out storage has the following advantages for meeting the diversified needs of data storage:
Huawei OceanStor Pacific scale-out storage's fully symmetric scale-out architecture is critical to the linear growth of capacity and performance. In 2019, it helped expand the "Counting On Me" platform's capacity and bandwidth from 2 PB to 10 PB and from 6 GB/s to 30 GB/s, respectively. In 2020, the capacity and bandwidth were expanded to 20 PB and 60 GB/s, respectively.
In addition, the OceanStor Pacific features an ultra-high-density design with 120 disk slots per 5 U chassis, 20% higher than the industry average. It works with an ultra-high-ratio EC data redundancy protection algorithm to improve disk utilization to 91.6% while ensuring reliability.
Huawei OceanStor Pacific scale-out storage uses the innovative scale-out file system OceanFS and unique adaptive I/O data flow technology. For bandwidth performance involving mixed read and write requests from clients, and for file creation and ultra-high IOPS performance of mixed read and write, OceanStor Pacific offers vastly superior performance over open source Lustre systems.
The OceanStor Pacific object storage supports the native semantics of S3 interfaces, and the protocol efficiency is more than four times that of self-built Lustre systems. On top of that, when handling massive small files, OceanStor Pacific can ensure a stable performance of 1 million TPS for 100 billion objects in a single bucket, greatly improving the computing and processing efficiency for platform users.
Huawei OceanStor Pacific scale-out storage uses the SmartTier intelligent storage tiering technology to conduct unified management over hot, warm, and cold data in the domain. This technology automatically identifies and evaluates the access frequency, and then places the data into SSD, HDD/SSD hybrid, and HDD storage. In this context, data can migrate automatically without manual intervention. This enables flexible and effective management over the entire data lifecycle.
Shakespeare wrote, "What is past, is prologue." In a similar vein, there is a Chinese saying, "The journey ahead may be long and arduous, but with sustained action, we will eventually reach our destination and embrace a brighter future." The scale-out "Counting On Me" platform takes the lead in building a data-intensive HPC platform with a unified foundation, setting a benchmark for government-industry-academia-research collaborative innovation in digital transformation of college education.
Looking ahead, Shanghai Jiao Tong University will work with Huawei to foster further cooperation and exploration, including ultimate capacity scalability, cross-campus and cross-region resource sharing and collaboration and privacy computing. With our solid technological foundations and deep insights into the HPC industry, we will work together to embrace the ever-changing landscape of data storage and application, lead the way in building an intelligent data-intensive HPC industry, and write a new chapter for digital education development.