• build hpda storage base education banner pc

    Building a Robust HPDA Storage Base for Quality Education and Scientific Research

As the infrastructure and data flow requirements of scientific research change, there is a growing need for storage solutions that accommodate high-performance data analytics (HPDA). Huawei's HPDA storage solution optimizes computing, storage, and network resources based on the characteristics of teaching and scientific research services. This solution is cost-effective, fast, and easy to use, building a solid data base for scientific research centers in universities.

Informatization is boosting education development and has become an important factor in measuring the quality of education in a country. The informatization infrastructure of basic and higher education plays an important role in transforming education ideas and concepts, promoting teaching and learning reform, and modernizing management methods. Universities rely on high-performance computing (HPC) platforms to promote digital transformation of education, and many have developed their own HPC platforms. Hybrid computing, on the other hand, requires a more robust and powerful storage base that supports HPDA scenarios.

From Decentralized to Intensive

In recent years, we have seen increasing investment in education infrastructure worldwide. For example, in 2018, the EU proposed the EuroHPC JU to coordinate resources between the EU and its member states. The EU plans to invest 1 billion from 2019 to 2026 to build a European HPC and big data system supported by world-class HPC and data infrastructure. By 2020, the figure had increased to 8 billion euros, aiming to drive the development of scientific computing and supercomputing-oriented interdisciplinary research within universities.

Looking back at China, in the past, universities built several small supercomputing platforms to meet the computing requirements of a single department or subject group. Since each faculty built their own independent platforms, computing resources were difficult to share, resulting in a low computing power utilization rate. There was a lack of dedicated equipment rooms, inadequate power supply assurance, and a high level of power consumption and fault rates. More importantly, they lacked professional O&M employees. The skills of the existing O&M employees varied, and safety issues could easily be overlooked.

With the development of education informatization, these amongst other issues needed to be solved urgently. Therefore, the Ministry of Education of the People's Republic of China and a further six departments released the Guideline on Promoting the Construction of New Education Infrastructure and Building a High-quality Education Support System in July 2021, clearly stating that education should be data-driven with collaborative governance and proactive services. They stipulated that particular attention should be paid to the shared and intensive construction of intelligent infrastructure and high-performance scientific research equipment in universities to build an efficient, secure, and reliable education infrastructure system.

From "Computing-centric" to "Data-centric"

HPC depends on a considerable allocation of computing power as it is the engine that drives the digital transformation of society as a whole. It requires ample data, as the foundation for computing. With the development of the digital economy, the relationship between data and computing power is going to change.

In the past, computing power was the mainstay. During the last half century, data has been considered as only an auxiliary facility for computing power when it came to solving complex scientific and engineering problems. This has meant the industry put much more emphasis on how to improve computing power and the use of data fell to the sidelines.

But now, data has become the star of the show while computing power remains the accompaniment. Diversified computing power provides more abundant computing resource options for HPC. However, the evolution of storage power is finding it hard to catch up with the rapid growth of computing power. In this context, multiple heterogeneous computing powers are often closely centered around one data storage unit. HPC is evolving from computing-centric to data-centric, making scientific research more data-intensive. Applications for such scientific research models are characterized by a non-repetitive nature, high uncertainty, high dimensionality, and high computational complexity. In addition, the requirements for the underlying data infrastructure have changed fundamentally. Currently, scientific research is increasingly demanding data flow. As such, HPDA scenarios in education and scientific research face four challenges:

• First, hybrid computing requires a more robust and powerful storage base that supports HPDA scenarios. Both the intelligent computing centers of scientific research institutions in universities and the supercomputing centers of governments need to run different types of scientific research applications at the same time. HPC has bandwidth and OPS service load requirements in different processes, and so does HPDA. Traditional storage supports only a single load model, resulting in function silos where data from different processes must be migrated between different storage systems, affecting the efficiency of the entire process. Therefore, powerful HPDA data storage is required to support these hybrid workloads and reduce data bloat and migration.

• Second, with the expansion of research topics, the sheer amount of data is soaring, which poses new challenges to equipment room space, power consumption, and storage costs. From petabyte to exabyte, data shows more value, and users also like to retain data for a longer time. Therefore, HPDA data storage must use technologies with high density, efficient redundancy, access frequency tiering, and deduplication and compression to reduce storage costs.

• Third, in HPDA scenarios, different systems require different storage services. For example, gene sequencing demands diversified files, big data, and object services in different process phases, making efficient process data access challenging. Therefore, HPDA storage should improve the efficiency of multi-application collaborative analysis and promote cross-disciplinary convergence. One centralized system that supports multiple protocols for data exchange, reduces intermediate links, and manages and maintains resources centrally is required for agile service response.

• Fourth, in scenarios like high-energy physics analysis, weather forecasting, and pharmaceutical research, massive data needs to be quickly analyzed and processed within a short period of time. This poses high requirements on the processing capability of the HPDA storage system.

Storage is now the most important part of HPDA solutions for education and scientific research. As HPC evolves to data-intensive HPDA, the industry needs a storage solution that supports hybrid loads, multi-protocol interoperability, and ultra-high-density design to cope with soaring HPDA workloads.

A Next-generation Storage Solution

With years of in-depth engagement in both the education and scientific research sectors, Huawei has launched a next-generation HPDA storage solution. Using the OceanStor Pacific distributed storage and parallel file system, this solution optimizes computing, storage, and network resources based on the characteristics of teaching and scientific research services. It is scenario-based, cost-effective, fast, and easy to use.

• Cost-effective storage: Massive data storage for data surges

OceanStor Pacific launched a series of ultra-high-density hardware that supports automatic data tiering across differing hardware, improving unit space capacity by 20% and greatly reducing storage costs in HPC scenarios. OceanStor Pacific supports 120 3.5-inch hard disk drives (HDDs) in a 5-U space, freeing up more than 60% of the available cabinet space and eliminating the burden of massive data storage.

• Fast computing: Parallel client for next-generation distributed file system/data processing center (DPC)

OceanStor Pacific is equipped with a next-generation distributed file system and DPC private client. Just one storage system can provide twice the bandwidth for large files and five times the input/output operations per second (IOPS) for small files. In addition, the file system supports multiple types of service loads to speed up computing power for all hybrid load scenarios, which not only meets common service requirements, but also allow enterprises to explore new services regarding technologies like big data and AI at ease.

• Easy to use: One copy of migration-free data minimizes O&M complexity and improves system availability

OceanStor Pacific can meet diversified computing power requirements, improve the efficiency of multi-application collaborative analysis and promote cross-disciplinary convergence. It offers a centralized system that supports multiple protocols for data exchange, reduces intermediate links, and manages and maintains resources centrally for agile service response.

The HPC platform of Shanghai Jiao Tong University uses Huawei OceanStor Pacific distributed storage with large and small I/O adaptive processing technology to intelligently layer data flows, reducing the computing workload from three months to four days.

HPDA scenarios in education and scientific research are changing the distributed storage architecture. Huawei Storage will continue to innovate hardware, software, algorithms, and architecture with higher reliability, availability, and usability to achieve optimal HPDA.