To handle the diverse range of devices and new application workloads in data centers, high-speed data processing requirements, and new trends in data resilience, research needs to be done into technologies like data enablement and governance for new scenarios such as AI, data resilience and high reliability. This will enable us to build data enablement and resilience technologies for emerging services.
1. Optimized computing with storage for AI data lakes: Based on the overall decoupled storage-compute architecture and focused on scenarios like large AI models, technologies such as utilizing storage to replace computing and in-memory computation offloading are used to accelerate training and inference performance and maximize system resource efficiency.
2. Training/Inference hyper-convergence foundation technologies: In enterprise application scenarios based on large AI models, heterogeneous computing power virtualization and scheduling technologies are adopted to achieve efficient resource utilization. Model compression and network-storage-computing collaborative acceleration technologies are used to accelerate training and inference and improve model accuracy when resources are limited. Enterprise AI-integrated applications based on large models are developed, such as robot applications built on integrated training and inference, enterprise AI applications, and large model application technologies for enterprise privacy protection.
3. Data governance and knowledge repository storage technologies for AI data lakes and training/inference hyper-convergence: These technologies support automatic corpus extraction from documents in multiple formats, automatic classification and labeling of mass data, corpus quality evaluation, Retrieval-Augmented Generation (RAG) based on convergence of knowledge graph and vector database, intelligent feature engineering and knowledge injection, all enterprise data ingestion, and all data visualization, management, and availability.
1. Data resilience technologies: Research needs to be done into the data storage intrinsic resilience technologies and it should focus on storage hardware, architectural data protection, and secure data flows across trusted domains. AI technologies can be used to enhance data detection and security capabilities to address some of the data resilience challenges that our customers face in terms of business continuity.
2. System reliability: For complex cluster networks deployed with thousands of nodes and multiple network protocols, it is important to build a data storage network with global topology awareness technology and research the fault location technology for large-scale clusters to ensure millisecond-level fault detection and notification in the event of any hardware resource faults. The aim is to implement sub-second-level fault location and automatic recovery in storage clusters and support sub-second-level fault rectification in limited-scale clusters.
3. Data protection technologies for emerging scenarios: For emerging applications and data infrastructure scenarios such as AI, big data, and multicloud, research needs to be done into technologies such as AI full-lifecycle and full-stack applications, data-oriented disaster recovery, and backup and privacy protection to address challenges such as availability, security, and data privacy protection of AI systems. Research into disaster recovery, backup, and sharing technologies for massive applications and data in multicloud environments should be prioritized so that we can address challenges such as data backup without compromising performance, real-time disaster recovery, and sharing security on multicloud platforms.