Abstract:With the advances of information communication technologies, it is critical to improve the efficiency and accuracy of modern data processing techniques. The past decade has witnessed the tremendous technical advances in sensor networks, Internet/Web of Things, cloud computing, mobile/embedded computing, spatial/temporal data processing, and big data, and these technologies have provided new opportunities and solutions to data processing techniques. Big data is an emerging paradigm applied to datasets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Such datasets are often from various sources (Variety) yet unstructured such as social media, sensors, scientific applications, surveillance, video and image archives, Internet texts and documents, Internet search indexing, medical records, business transactions, and web logs and are of large size (Volume) with fast data in/out (Velocity). More importantly, big data has to be of high value (Value) and establish trust in it for business decision-making (Veracity). Various technologies are being discussed to support the handling of big data such as massively parallel processing databases, scalable storage systems, cloud computing platforms, and MapReduce. Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make business more agile, and to answer questions that were previously considered beyond our reach. This special issue wants to demonstrate the emerging issues in the research of big data and approaches towards it. Original and research articles are solicited in all aspects including theoretical studies, practical applications, and experimental prototypes. The submitted manuscripts were reviewed by experts from both academia and industry. After two rounds of reviewing, the highest quality manuscripts were accepted for this special issue. Totally, we have received 20 manuscripts and 11 papers are accepted. Five papers are selected from SKG2015 conference with about 50% new content. This special issue will be published by Concurrency and Computation: Practice and Experience as special issues. In order to detect and describe the real-time urban emergency event, the knowledge base model is proposed by the paper by Z. Xu et al 1. The crowdsourcing-based knowledge base model is introduced, which uses the information from social media. X. Lin et al 2. introduced one such comprehensive schemes: it takes consideration of the DG's own characteristics and its ability to support the local loading, it adopts different protection strategy, and realizes the fault isolation and island division through the coordination between the protection and the automatic devices. J. Zheng 3 extracted the feature information and also mined the association rules to identify and mark the unknown protocol using the learning mechanism of machine. The unknown protocol in a specific environment was found and analyzed by marking the fingerprint information of protocol. M. Xie 4 studied meta-analytically integrated results from piles of network position researches in organizational innovation in 20 years. Considering the real social community network partition approach regardless of the directed and weighted characteristic, K. Gao 5 proposed a novel algorithm in pervasive sensing environment. Start with classifying and comparing current search engines, particularly from the perspective of search patterns which consist of index structure, user profiles, and interaction mechanism. X. Wei 6 then present a novel search pattern named ExNa by defining its model and basic operations in detail. Two attribute reduction methods based on minimum decision cost are proposed by Z. Bi 7 from the algebraic view and the information theory, respectively. Y. Wang 8 designs an optimization algorithm to solve the problem. Numerical experiments show that the optimization algorithm can solve the load imbalance of the METGRID to some extent, and the computation speed of the METGRID and REAL modules after the optimization on 64 cores is about 7.2 times faster than before. Y. Xue 9 proposed Event Space Model for event analysis by multi-viewer. In the model, each event is mapped into Network Public Opinion Data Space (OS) and Actual Behavior Data Space (BS). S. Zhang 10 proposed a model for estimating the out-degree of any one node in ALN from semantic feature view, which can greatly reduce the searching scope for the rapid positioning of Web resources stored in large-scale database. J. Zhang 11 presented how to mine patent text to get valuable information/knowledge from large-scale candidates obtained from these patents based on massive patent texts. The guest editors would like to thank Prof. Geoffrey C. Fox, who is the editor in chief of Concurrency and Computation: Practice and Experience. His help and trust are the most important thing for the success of this special issue. The guest editors would like to thank the reviewers for their high-quality reviews, which provided insightful and constructive feedback to the authors of the papers.

Introduction to Special Issue on Large-Scale Data Mining

Ieee Access Special Section Editorial: Advanced Data Analytics For Large-Scale Complex Data Environments

IEEE Access Special Section Editorial: Advanced Data Mining Methods for Social Computing

IEEE Access Special Section Editorial: Data Mining and Granular Computing in Big Data and Knowledge Processing

Editorial – a Special Issue on Data Mining

Large scale microblog mining using distributed MB-LDA.

Scalable Mining of Large Disk-Based Graph Databases.

Big Data-Related Technologies and Applications

Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook

Big Graph Mining: Frameworks and Techniques

Big Geodata Mining: Objective, Connotations and Research Issues

Metaheuristics for data mining: survey and opportunities for big data

Approaches for Scaling DBSCAN Algorithm to Large Spatial Databases

A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

A Survey on Large-Scale Machine Learning

A Review on Data Mining Issues, Solution & Techniques

Closed-loop Big Data Analysis with Visualization and Scalable Computing

Editorial of the Special Issue on Manifold Learning.

Algorithmic and Statistical Challenges in Modern Large-Scale Data Analysis are the Focus of MMDS 2008

Algorithm and approaches to handle large Data- A Survey