Arama
  • Education and Scientific Research RDM Data Lake Solution

    Education and Scientific Research RDM Data Lake Solution

    Efficiently and FAIR-ly manage massive amounts of research data.

  • Overview
  • Solution Architecture
  • Benefits
  • Products
  • Case Studies

FAIR Principles and Goals of Research Data Management (RDM)

The FAIR principles are internationally recognized guidelines that underpin effective RDM. Ensuring data is FAIR (Findable, Accessible, Interoperable, and Reusable) helps guarantee the integrity, usability, and safety of scientific data storage.

RDM is a major topic in modern science that goes beyond conventional data collection and storage. It covers all necessary processes related to data flows to ensure that data can be securely collected, stored, and analyzed while remaining usable over long periods of time.

FAIR Principles and Goals of Research Data Management (RDM)


Challenges for RDM

  • Insufficient Storage

    • The data volume increases explosively from PBs to EBs.
    • Massive numbers of large files on open platforms must be retained securely for at least 10 years.
    • Multiple copies are used for disaster recovery, causing a space utilization of less than 30%.
  • Data Silos

    • Data sharing across universities and institutions remains difficult, hindering interdisciplinary, cross-institutional, and cross-border collaboration that is essential for advancing open science.
    • Various HPC scenarios and heterogeneous research workflows and data involve complex data processing and diverse protocols.
  • Complex Management

    • Diverse storage devices used across universities and institutes result in various access modes and increased management complexity.
    • The lack of unified metadata management hinders efficient data retrieval in subject research, interdisciplinary projects, data cleansing, and cooperative research.
    • The lack of secure data sharing mechanisms makes it difficult to ensure the privacy, security, and regulatory compliance of high-value data.
Architecture

Architecture

The Education and Scientific Research RDM Data Lake Solution is designed for education and scientific research scenarios. Built on the DME data management platform and OceanStor Pacific all-flash scale-out storage, the solution enables efficient RDM. The solution uses the industry's unique 3-site synchronous multi-active and 12-site asynchronous multi-active technologies based on the object S3 protocol to achieve both flexible research data sharing and reliable disaster recovery as well as on-demand data mobility between the HPC platform and RDM data lake. The solution additionally enables retrieval from tens of billions of files in just seconds using DME (Omni-Dataverse) for unified scheduling of data across data centers and clusters.

Architecture
Huawei

Benefits

High Density & Green

High Density & Green

• High capacity density: The performance pool houses 36 × 61.44 TB SSDs in 2 U space and the capacity pool houses 120 disks in 5 U space.
• Unparalleled capacity utilization: High-ratio EC in a single cluster achieves 91.6% utilization. HyperGeoEC across clusters ensures that only EC fragments are stored.

Cross-Region Sharing

Cross-Region Sharing

• Across universities and institutions: EC can be deployed across up to 12 sites, with data accessible from any site.
• Between the RDM and HPC platforms: SmartSync enables data synchronization and mobility between service pools.
• Cross-service data sharing: Seamless multi-protocol interworking allows multiple services to access a single data copy, eliminating the resource waste caused by multiple copies.

Unified Management

Unified Management

• Solid data security: WORM and data encryption measures secure data from unauthorized changes or malicious access.
• Ease of use: A wide library of APIs support the integration of different applications, and tenant self-management interfaces facilitate on-demand management for different organizations.

Case Studies

TOP