Thinking and Modeling for Big Data from the Perspective of the I Ching
Chuang Lin,Guoliang Li,Zhiguang Shan,Yong Shi
DOI: https://doi.org/10.1142/s0219622017500286
2017-01-01
International Journal of Information Technology & Decision Making
Abstract:Data is growing faster than ever before and is changing our daily life. However it is rather challenging to manage the big data [F. H. Cate, The big debate, Science 346 (2014) 810, J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh and A. H. Byers, Big Data: The Next Frontier for Innovation, Competition, and Productivity (Mckinsey global Institute, 2011), S. Lohr, The Age of Big Data (New York Times, 2012), p. 11, L. Einav and J. Levin, Economics in the age of big data, Science 345 (2014) 715, M. J. Khoury and J. P. A. Ioannidis, Big data meets public health, Science 346 (2014) 1054–1055, V. Marx, Biology: The big challenges of big data, Nature 498(7453) (2013) 255–260.]. In this paper, we propose the big data thinking and modeling techniques from the perspective of the I Ching, which is a very famous imaginal thinking theory in China with 3,000 years history. The I Ching has been proven to be very useful and practical in many domains, e.g., 36 stratagems. Firstly, inspired from the three components of the I Ching, image, number and principle, we propose a new three-cycle big data thinking way, from data to phenomenon, from phenomenon to correlation, and from correlation to knowledge, which is a generalization of the fourth paradigm (from causality to correlation) proposed by Jim Gray. Secondly, inspired from the three entities of the I Ching, heaven, earth and human, we propose a new big data modeling method. We use the tree entities to represent the big data. We map the 4[Formula: see text]V of big data (volume, variety, velocity, veracity) to four opposition and uniform relations in the I Ching, and generate the eight diagrams. By capturing the relationships between eight diagrams, we generate the 64 hexagrams, and use 64 hexagrams to model big data. We also provide the principle rules to understand the knowledge generated by the model. Thirdly, we discuss how to utilize our model to describe big-data management tools, including, MapReduce, Spark, Storm. We also provide a new model for handling distributed data streams. We do think that we provide a new practical way of thinking and modeling for big data. We also believe that this will open up many new research directions on big data.