Finding Value in Big Data Analytics
By Dr. Li Deyi, Honorary Member, Cloud Computing Expert Committee of the Chinese Institute of Electronics
Cloud computing innovations are actively contributing to the success of new business models and have a profound impact on the IT industry. For China, throughout the duration of the Twelfth Five-Year Plan, the cloud computing market will reach US$122 billion to US$162.7 billion. This leads to global IT vendors fiercely competing to grow their businesses and dominate the field.
Cloud Computing and Big Data
Today, the concept of “Big Data” is usually associated with cloud computing. Big Data can surely be of enormous value when transformed by data mining, application services, or for scientific research of environment, energy, meteorology, aerospace, biology, or finance. But, what exactly is the relationship between cloud computing and Big Data? Without the Internet, cloud computing would not exist; and without cloud computing, Big Data processing technology would not exist.
Big Data processing technology in cloud computing environments face new challenges. A traditional relational database is insufficient to process and analyze Big Data for concurrent read and write operations by huge numbers of users. Systems lack efficient storage platforms and lack access to massive data volumes, and they also can’t provide high system availability or scalability. To address these problems, some vendors have developed new technologies such as distributed caching, distributed file systems, non-relational databases, and new types of relational databases.
Similarly, traditional data processing technology is not suitable for processing the massive volumes of data collected by today’s cloud computing machines, especially when the information is semantically or geographically dispersed. This problem also challenges distributed parallel processing technology, for which new processing technologies have emerged, such as MapReduce, parallel data processing, incremental processing, and streamed computing.
For cloud computing, data centers are critical to store and access important data assets. Big Data processing and analytics are highly dependent on cloud computing for its resources and capabilities to explore data relationships. We have some very good examples. The New York Times used cloud-computing technology to successfully convert 400,000 scanned newspaper articles from 1851 to 1922. Using several hundred computers, it only took 36 hours to complete this monumental task. Another example is Visa Inc., when they adopted Hadoop-based processing to complete computing two years of transaction records in 13 minutes — a Big Data set that consisted of 73 billion transaction records and requiring more than 36 TBs of storage — a job that would have taken an entire month using traditional processing technology.
Cloud computing is considered an extremely important investment area, despite a number of valid concerns over the technology. One major concern is the “cloud bubble.” There are surveys showing that huge investments have been poured into cloud systems, and that less than 20 percent of the built capacity is actually in use. Unfortunately, cloud computing centers are often used to bolster public image or, even worse, are merely added to a company’s commercial real estate portfolio without serving much purpose except to depreciate in value. To ensure scalable growth of cloud computing as a green technology, innovative applications are essential for the industry.
Another serious concern is that cloud computing is being trumpeted as the solution to all problems. It seems everything can be cloud-based and everything on the Internet is being correlated to cloud computing in some way. Cloud computing is a catch-phrase becoming so ubiquitous that consumers and investors often can’t tell if they are actually dealing with cloud computing or not. What are the essential characteristics of cloud computing then? First, cloud computing can be an Internet-based computing technology with public participation. Cloud computing is highly reliant on a network server and Internet access; and, more importantly, the resources it requires are not provided by clients but by the servers on the company network or leased from a provider. This means the network provides computing capabilities, storage capacities, software functionality, and information services to businesses and individuals. Second, cloud-computing services are highly scalable and dynamically adjustable to meet user demands. Service resources and capabilities can automatically increase within a few minutes or even just seconds to respond to peak traffic on a network, but can also dynamically scale back when demand decreases.
Extracting Value from Big Data
With the inevitable popularity of the mobile Internet, we can only tap into the potential of large-scale, low-value density data through data mining. Big Data mining for mobile Internet mainly refers to unstructured data mining in a network environment. To maximize revenue derived from data mining, future applications also will change to focus on specific demographics or niche needs. The mobile Internet architecture is designed from “bottom up” rather than “top down,” which emphasizes the authenticity and timeliness of data mining in order to find correlations, identify trends, and ultimately determine the value of the data.
The online behaviors of netizens who access information from the Internet are rapidly shifting towards content creation and collective intelligence — no longer limited to just searching and browsing. To understand the public, the majority, and the minority, we can identify and examine social behaviors, especially competitive and cooperative behaviors, from both micro and macro perspectives. We need to pay special attention to the use of Internet-based Big Data mining methods, also known as community technology and community discovery technology. For example, Threadless, an online T-shirt retailer and creative group, uses its website to promote products and services. On its website, people can design and vote for the most creative T-shirts, and those with the highest votes are awarded a considerable bonus. But the biggest attraction of the website is that it encourages people to share their T-shirt patterns and gain recognition from the public. The Threadless success story illustrates how a business can establish a win-win position in both online retail and online community building. Every week, Threadless receives more than 800 new designs, and every day more than 1,000 new users register to discuss design and art on its website. Inspired by users’ designs, Threadless also recommends music and videos to match their designs.
Big Data represents a new revenue source built around interaction in which Internet-based data mining techniques will transform information services based on the value extracted and converted from data sources. In this regard, Big Data will create even more value with accelerated computing and storage support from cloud computing technologies.