In the collective studies published by the Communist Party of China (CPC) Central Committee Political Bureau in October 2016 and December 2017, President Xi Jinping emphasized that China must promote technology, service, and data convergence (‘three-convergence’) to enable cross-layer, cross-region, cross-system, cross-department, and cross-service (‘five-cross’) collaboration management and service. This has become the guiding principle of government informatization in China. During the construction of Digital China, Audaque needs to comply with this principle and build a ‘five-cross’ data governance system. Such a system is the key to realizing the value of government Big Data. With the development of the national Big Data strategy and the establishment of the National Network Information Committee and Big Data bureaus in different regions, ‘five-cross’ converged data governance system construction has commenced in many places.
Although cities have ‘three-convergence’ and ‘five-cross’ as guiding principles, they still need a ‘five-cross’ data governance methodology to establish data governance systems and construct converged data governance platforms. Based on their experience with Huawei’s ecosystem and practices, Audaque has summarized the Government Logical Data Model (GLDM) methodology to standardize the establishment of data governance systems and the construction of data governance integration platforms.
Avoiding ‘Columbus’s Dilemma?’
The government’s data sources are like islands (most of them are information silos) that are ‘discovered’ when built. The era of ‘three-convergence’ and ‘five-cross’ is similar to the Age of Discovery in the 15th century when small islands were discovered and connected to the rest of the world, and global trade was made possible. In fact, this current trend is making this moment a data discovery era. In the Age of Discovery, early navigators, such as Christopher Columbus, were often confronted with a dilemma: When they embarked on an expedition, they had no known destination; when they made landfall, they did not know where they were; when they returned home, they had no name for the places they had been. The GLDM is a modern data navigation strategy to help avoid a new ‘Columbus’s Dilemma.’ During the construction of the ‘five-cross’ data governance system, we do not know what we can do (at the beginning), what needs to be done (during the process), and what we have done (in the end).
To be specific, the GLDM data navigation strategy for data governance system construction addresses ‘Columbus’s Dilemma’ with four elements:
• Map: The streamlining of government information resource catalogs is similar to developing a map in the Age of Discovery; the finished product can identify the locations of continents (large data users), small islands (data resources), reefs (sensitive data), and glaciers (data that is difficult to coordinate). ‘Five-cross’ is an essential feature of governmental data, and it distinguishes governmental data from enterprise data. With a hierarchical structure, the government cannot adopt an enterprise informatization construction mode in any place where a unified national IT department builds and operates information systems. Instead, each department or business team at each level must independently conduct informatization construction and operate their own information systems. This makes the streamlining of government information resource catalogs a top priority. The streamlining of these catalogs is a general survey of data and businesses, with the focus on maintaining the status quo and meeting requirements. The contents include the following: 1) Responsibilities and services of bureaus and offices; 2) Processes and systems of each business; 3) Data that is generated and used by each service and system; 4) Databases; 5) Data organization methods; 6) Systems that are being built or planned; and 7) Types of data.
During the streamlining process, the following is collected and recorded: 1) Data and database generation systems and processes; 2) Data source departments; 3) Data storage locations; 4) Metadata (such as database types, data formats, data models, data standards, data update frequency, and data interfaces); and 5) Pain points and bottlenecks (such as business risks and information siloes).
Finally, a panorama of city/regional governmental administration data is formed. Due to the standardization of government responsibilities by regulations such as the ‘Three Determinations’ (determination of the position, organization, and headcount) and administrative authorization, the streamlining of government information resource catalogs is often similar among provincial, municipal, and county-level governments. Based on such similarities, the GLDM simplifies the streamlining of the catalogs of each government department that adopts the GLDM. The review result of the catalogs is recorded in a metadata management system. In the governmental data governance system, the basic functions of the metadata management system form the government information resource catalogs.
• Route: The data sharing and exchange platform charts the course for data navigation. At present, there are many data sharing and exchange platform products, and many articles on data sharing/exchange theory and practice exploration, which will not be elaborated on in this document.
• Compass: The data standard, data supervision, and data compliance platforms are similar to compasses in that they prevent data governance system construction from going in the wrong direction. The construction of a data governance system is like building a data factory. The input of the factory is the as-is data (source data). The output includes the data resource (basic library and theme library), as well as the quality feedback and security supervision of the as-is data.
• Ship: The data quality governance platform and ‘five-cross’ data convergence platform are key devices in the data factory. They are the ships of data navigation, and the real navigation depends on the two platforms. The data quality governance platform is like a rudder, which controls the ship’s direction. The ‘five-cross’ data convergence platform is like an engine, which pushes the ship forward.
A data governance system that contains these four elements can be used to manage and monitor the metadata (information directories), standardization process, quality, and security. This system can also support the ‘three-convergence’ and ‘five-cross’ concept and visualize this model as ‘five-cross’ data standardization, consistency, timeliness, integrity, and entity consistency. In this way, a comprehensive data governance system can be established, which facilitates the implementation and mutual guarantee of data catalogs, data standards, data quality, and data security governance.
Data Standard Platform: Ensure ‘Five-Cross’ Standardization to Better Control Data Processing
A map is not enough for data navigation because the map can only inform us of the current data and data requirements. If we do not know the data processing goals and target data we can still get trapped in Columbus’s Dilemma. We are more concerned that data processing is unpredictable and uncontrollable, and the results vary according to each person, time, and thing; therefore, we need to develop standards for our target data. The more refined these standards are, the more controllable the data processing procedure.
The as-is data is often business-oriented, and its modeling is driven by applications. This means that the as-is data presents us with information such as social security payment and compensation records, test reports, clinical cases, birth certificates, residence registration, rental contracts, and household registration records. On the other hand, the target data is resource-oriented, and its modeling is driven by general data. In essence, the target data is mapped to the physical world and integrates the data description of the city management service entities in the data space. The target data represents every person, certificate, enterprise, social organization, house, component, car, road, and event in the city.
The data standard platform should first implement the modeling of target data, including data encoding, data model, data storage, data exchange format, and data sharing interface standards, as well as data meta-standards.
Second, the data standard platform must enable gradual standardization from the source data to target data. As the as-is data systems and databases have been constructed, abandoning the model, code, type, dictionary, format, and interface of inventory data and starting all over again is costly. In addition, a large number of new smart applications will be deployed during Smart City construction and a large amount of incremental data will be generated. If we adopt the source business data standards that are compatible with the target data during new system construction, data waste will be greatly reduced and a significant number of data cleansing costs can be saved; therefore, the data standard platform requires general business data standards and key dedicated business data standards, and we must adopt these standards in the initiation and acceptance of informatization projects.
Third, the data standard platform needs to implement standardization in the data processing procedure. Both the as-is data and target data are standardized, making it easy to standardize the process of transforming as-is data into target data. In this way, we can construct the data factory in a standard manner and make the data factory a systematic, standardized, and intelligent data refinery. Data processing standards include data cleansing rules, data fusion process standards, and data quality assessment standards. The standardization of the target data, source data, and processes prevents detours, wrong directions, and other mistakes. The data standardization platform has the following functions:
• Assists in formulating standards (through standard induction, discovery, and analysis).
• Manages existing standards.
• Ensures standard application in system design and development (through standard registration, release, subscription, and adoption registration).
• Performs standards compliance tests on inventory and incremental data, detects data problems using standards (through data error check), implements intelligent standardization on problem data, and solves the problems (through error correction).
Data Supervision and Data Compliance Platforms: Ensure Data Security and Prevent Risks
The data standard platform can solve the most difficult issue in data governance system construction: Standardization. However, data governance system construction has another important issue: Security. Information resource catalog streamlining reveals the data in each bureau and business system. How can Data Protection Authorities (DPAs) prevent security problems in the processing and application of source data and target data? How can they seal all data leakage points to prevent data loss, data breaches, malicious tampering, and illegal commercial use? DPAs must utilize data supervision platforms to ensure information security. In fact, data transactions, operations, openness, and sharing should be under effective data supervision to ensure healthy and orderly processes; otherwise, risks will accumulate in transactions, operations, openness, and sharing, and with the implementation of data legislation and data policy formulation in the future, flare-ups can occur at any time. DPA departments must supervise data exchanges and data operation companies to avoid data disorder. This is similar to the scenario where Securities and Futures Commissions regulate stock exchanges to avoid problems, such as an Internet financial crisis. “Render to Caesar the things that are Caesar’s; and to God the things that are God’s.” The market can develop and use data. However, data supervision is the bottom-line responsibility of the government in data transactions and operations, as is similar to the finance bureau having responsibility for supervising the financial industry, land resource supervision responsibility by the land bureau, and the content creation and public opinion survey responsibilities of the cyberspace administration.
The General Data Protection Regulation (GDPR) for data supervision and protection in the EU took effect on May 25, 2018. Some regulations, such as “right to be forgotten,” “right to data portability,” “right to be informed,” and general record keeping requirements are exerting significant influence on Internet and Big Data enterprises in China. At the same time, the “owner principle” (long-arm jurisdiction principle) and “personal information exit principle” will affect China’s data sovereignty and data legislation. Data protection legislation, establishment of a DPA, and the specification of the DPA’s supervisory responsibilities must be implemented as soon as possible. The data supervision platform prevents data governance system construction from taking detours or the wrong paths.
In addition to the data supervision platform constructed by the DPA, enterprises and government bureaus that process personal information need to build a data compliance platform that will be managed by the DPA. The purpose is to ensure the implementation of data management measures and eliminate risks in data collection, processing, sharing, exchange, and openness.
Data Quality Governance and Data Convergence Platforms: Ensure Data Quality and Prevent GIGO
After creating the map, route, and compass of the data navigation process, we need a vessel to ship the data to the destination. The ship’s core components are the rudder (data quality governance platform) and engine (‘five-cross’ data convergence platform). Garbage In, Garbage Out (GIGO) can occur in ‘five-cross’ data processing via governmental administration applications in complex scenarios. In this case, data application makes the situation worse. The two platforms can prevent this from happening. During the transformation of the source data to target data, problems such as data duplication, data conflicts, data errors, and format disorder can emerge. There are two kinds of errors: Format errors and substantial errors. Format errors can be rectified by current technical measures such as automatic data cleansing. By contrast, substantial errors cannot be rectified in a fully automated manner. In addition, government departments often do not allow automatic data cleansing. To deal with substantial errors, source business systems or data responsible departments must manually modify data while complying with laws. However, large-scale manual intervention takes a long time, hindering data resource library construction. Therefore, we need both a data quality governance platform and a ‘five-cross’ data convergence platform. The data quality governance platform automatically detects data errors, introduces manual intervention to rectify substantial errors (the system provides recommended values), and controls the quality of source data. The ‘five-cross’ data convergence platform does not require manual intervention. It can ensure and improve data quality to the greatest extent and enable precise decision analysis applications. While ensuring correct statistical significance, the ‘five-cross’ data convergence platform can continuously deal with all data problems in the background and build a data resource library in the shortest time.
The data quality governance platform comprehensively uses technical measures and management mechanisms to control source data quality, and scientifically appraise the data sharing performance of commissions, bureaus, and offices. The ‘five-cross’ data convergence platform is like a factory with round-the-clock data production lines that extract source as-is data to target data resources. In the GLDM methodology, the data quality governance platform is a data ‘Skynet’ system consisting of three data governance network layers (exploration network, standard network, and quality network). The ‘five-cross’ data convergence platform is a data factory with a production line-based structure of six layers (history, standard, atom, integration, data mart, and application).
GLDM: Building Blocks for the China Data Governance Solution
Information resource catalog streamlining (map), the data sharing and exchange platform (route), the data standard and data supervision platforms (compass), and the data quality governance and ‘five-cross’ data convergence platforms (ship) constitute the GLDM ‘five-cross’ data governance methodology for data navigation. The cooperation between Audaque and Huawei has made this methodology the best form of knowledge accumulation, and it now provides guidance for best practices. It enables Big Data center construction to avoid the detours and mistakes of early exploration. As a practical exploration of the ‘three-convergence’ and ‘five-cross’ principle, data center construction using the GLDM methodology is highly successful.
Over the past 30 years, Logical Data Models (LDMs) have played a vital role in many fields, such as finance, telecommunications, energy, and transportation. Teradata, the data warehouse leader, has become one of the most important data companies in the world with its insights into LDMs in many industries. Worldwide, there is a lack of large-scale ‘three-convergence and five-cross’ practices and logical models for cross-department and cross-service governmental data. This means that ‘five-cross’ LDM is currently unavailable. The development of the GLDM methodology fills this gap. By continuously summarizing the data center and data governance system construction experience of each province, city, district, and county, the GLDM helps gradually improve data center and data governance system construction at all levels.
At the Big Data Expo in May 2017, the GLDM methodology received extensive attention and was reported heavily by the People’s Daily Online, China News Service, Phoenix Financial Daily Report, and Guizhou local media. Audaque is working with more provincial-, municipal-, district-, and county-level Big Data centers, Big Data bureaus, economic informatization commissions, cyberspace administrations, and digitization offices to summarize and share more experience in data governance system construction and enrich the GLDM methodology.
At the Huawei Partner Conference 2017, Audaque and Huawei jointly released the governmental data governance and convergence solution with GLDM as the methodology. This solution has been demonstrated several times and won wide recognition. Audaque is willing to work with governmental data authorities to explore, practice, and create building blocks for China’s Data Governance Solution.