Software-Defined Infrastructure For Decentralized Data Lifecycle Governance: Principled Design And Open Challenges
Gang Huang,Chaoran Luo,Kaidong Wu,Yun Ma,Ying Zhang,Xuanzhe Liu
DOI: https://doi.org/10.1109/ICDCS.2019.00166
2019-01-01
Abstract:Exploring and mining the explosive burst of "big data" has already generated a lot of innovative applications, especially the recent advances of AI applications, and thus produced big values to the human society and civilization. However, due to the centralized patterns of data governance activities, including creation, sharing, exchange, management, analytics, tracing, and accounting, the potential values of big data distributed on the Internet are far away from being adequately explored. The recent announcement of data protection policies/laws such as GDPR makes the problem even more challenging. We are now at a moment of truth where the data governance infrastructure should be reconsidered and redesigned. In this paper, we propose a software-defined infrastructure design in a decentralized fashion: data owners are able to implement and deploy their own rules to the application systems where the data are produced for further governance activities. Such a fashion is quite similar to the popular software-defined networking where users are allowed to deploy rules of switches and customize the uses. Our principled infrastructure design can radically reform the current data governance activities into a decentralized topology. On the one hand, data can be separated from the application that generates the data, and data owners can have the full rights to decide where their data should be stored and how the data can be shared. On the other hand, data users can search, discover, integrate, and analyze the data from various data sources according to their application requirements and scenarios. As a result, we argue that our infrastructure can establish a new generation of responsive decentralized data governance that can promote the innovation of linking data to the better adaptation of the open environment and the diverse user requirements. With this perspective, we briefly discuss some key insights and enumerate several related new technologies and open challenges.