Introduction to the special section on peer-to-peer computing and web data management
Aoying Zhou
DOI: https://doi.org/10.1007/s11704-008-0031-x
2008-01-01
Frontiers of Computer Science in China
Abstract:Peer-to-peer (P2P) computing has been attracting attention from quite a few researchers and practitioners from different fields of computer science, such as networking, distributed computing, and database. Over P2P environment, the data management becomes a challenging issue. There have been a lot of very wide and profound researches on this topic, including data integration, query processing, and fine-granularity data sharing. Considering that more and more interests are focusing on data intensive computing and data cloud computing in industry and academia, we organize such a special section and try to cover some important progresses related to P2P computing, which are from the database perspective. The five papers included in this section have been reviewed and selected by prominent experts in the related fields. The topics covered by these five papers are: distributed storage of large-volume Web data, distributed group-based resource management, querying of high-dimensional data over P2P systems, distributed query processing in flash-based sensor networks, and data lineage tracing in P2P systems. With the development of the Web, storage and utilization of web data has become a big challenge for data management research community because of its heterogeneity and dynamic. Yang et al. surveys the data model, called Wide Table, which was proposed for managing web data. The requirements and challenges for web data management are discussed, and some important existing techniques, such as logical presentation, physical storage, and query processing, are deliberately analyzed. Zhang’s paper is also about distributed resources management. A framework for distributed group-based resource management is presented in this paper. Such a framework fits the resource management for various web communities, say, interest-based organizations. The Chord protocol is adapted to organize nodes in groups, and a new communication protocol is proposed for nodes from different groups. With this framework, it is easy to do group activity analysis, and it is possible to reach good scalability, high search efficiency, and system robustness. P2P systems have been widely used for sharing and exchanging of data and resources among numerous computer nodes. Data objects could be identified with high-dimensional feature vectors. Supporting K nearest neighbors query (KNN) over high dimensional data objects in P2P systems is an important and challenging issue. This is treated in the paper authored by Li et al. and some efficient query algorithms and solutions are proposed to address the challenges raised by high dimensionality in the paper. In Xu’s paper, the data management techniques for distributed in-network data storage are touched. This kind of techniques is needed in storage-centric sensor networks with all sensor nodes equipped with high-capacity flash memory storage. The sensor data could be stored and managed inside the network, and the communication among nodes could be reduced. The design of storage management and indexing structures combining sensor system workload and flash memory characteristics are challenges addressed in this paper.