Emerging Intelligent Big Data Analytics for Cloud and Edge Computing.
Fang Dong,Jianming Yong,Xiang Fei
DOI: https://doi.org/10.1002/cpe.5989
2020-01-01
Concurrency and Computation Practice and Experience
Abstract:Intelligent big data analytics is an emerging paradigm in the age of big data, analytics, and artificial intelligence, and it exploits how to use artificial intelligence to enhance big data analytics for various applications.1 As cloud computing cannot meet the strict computing time requirement in latency-critical big data analysis applications, edge computing has emerged as a solution to address the drawbacks of cloud-based solutions by moving computation physically closer to the network edge where data are generated. However, edge computing does not have sufficient resources for complex intelligent big data analytics tasks. Consequently, this special issue is focused on exploiting key techniques of intelligent big data analytics by involving cloud and edge computing. Presented with an avalanche of biological interactions data, computational biology is now facing greater challenges on big data analysis and requires more studies to mine and integrate cloud-based multiomics data, especially when the data are related to infectious diseases. Meanwhile, machine learning techniques have recently succeeded in different computational biology tasks. For this reason, Chen et al2 proposed APEX2S, a novel two-layer machine learning model, for discovery of the protein-protein interactions data. APEX2S calibrated the focus for host-pathogen protein-protein interactions study, aiming to apply machine learning techniques for learning the interactions data and making predictions. To date, there are a wide variety of applications of human action recognition, such as surveillance, robotics, health care, video searching, and human-computer interaction. However, there are many challenges involved in human action recognition in videos, such as cluttered backgrounds, occlusions, viewpoint variation, execution rate, and camera motion. To solve this, Zhao et al3 proposed a novel action recognition method to improve the recognition accuracy by adopting the key frame extraction and multi-feature fusion techniques. A key frame extraction method based on node contribution weighting is proposed to extract video key frames, and different convolutional neural networks are used to obtain corresponding classification results and merge, so as to better complement the information in different flows. Many applications are now deployed on Virtual Machines (VMs) or even Spot VMs elastically rented from public Clouds. To save costs, interval-priced VMs are not released until the ends of rented intervals. Such delays of control effects make existing methods rent or release excess VMs leading to over controls. Fluctuating prices make Spot VMs unreliable due to unexpected termination which makes fault-tolerant strategies crucial. In order to decrease the VM rental cost while guaranteeing the SLA and robustness, Cai et al4 proposed a hybrid control method UCM which takes advantage of queuing-model-based loosely coupled controllers, unequal-interval-based collaborating method, and an existing group-based fault tolerant strategy. Lidar-based city objects detection is an interesting topic along with the development of Laser scan equipment which has been widely applied in various applications such as 3D building reconstruction, navigation, and so on. Superpixel segmentations are widely applied to image processing or computer vision tasks. Many experiments have proven that superpixels generated from atomic meaningful pixel regions, can improve the processing efficiency while losing little information of the original image. Therefore, Mao et al5 describes a city object detection algorithm for airborne Lidar images using superpixel segmentation and DenseNet classification. A three-block DenseNet is applied to classify the superpixels into four main types of city objects (Building, road, field, and railway). In addition, a graph based neighborhood adjustment algorithm is designed to further improve the classification results. Virtual network embedding (VNE) aims to solve how to efficiently allocate physical resources to a virtual network. However, this issue has been proved to be an NP-hard problem. To address the challenge, Wang et al6 formalize the problem as a mixed integer programming problem and propose a novel VNE method based on reinforcement learning. Then to solve this problem, Wang et al6 introduce a pointer network to generate virtual node mapping strategies through an attention mechanism, and design a reward function related to link resource consumption to build the connection between node mapping and link mapping stages of VNE. Dual-hop 60 GHz wireless networks which support relay-assisted dual-hop transmission have been widely adopted in recent years, aiming to prolong communication distance and bypass obstacles in 60 GHz band. However, it is very challenging to perform link scheduling in such dual-hop architecture while considering several factors, that is, reducing network power consumption, avoiding overloaded APs/relays and adapting to network dynamics. To this end, Wu et al7 investigate the problem of energy efficient link scheduling with load constraints (ELL), and propose solutions to deal with network dynamics by presenting a fine-grained energy model for dual-hop 60 GHz networks and proposing a polynomial-time global scheduling algorithm. Job-pool based workload estimation has attract a lot of attention recently, which analyzes the characteristics of existing tasks' workloads to estimate the currently running tasks' workload. However, the workload patterns of some tasks do have seasonality and trend, and conventional per-job based regression methods may yield better workload prediction results. Also, in some cases, some new tasks may not follow the workload patterns of existing tasks in the pool. Thus, Yu et al8 develop an integrated scheme which combines clustering and regression for workload prediction. Exorbitant resources are required to train a deep neural network (DNN). Often researchers deploy an approach that uses distributed parallel training to acquire larger models faster on GPUs. This approach has its detriments, though; on one hand, a GPU's expanded capacity to compute also produces bigger bottlenecks in inter-GPU's communications during model training, and multi-GPU systems lead to complex connectivity. Workload schedulers then end up having to consider hardware topology and requirements for workload communication, in hopes of allocating GPU resources to optimize execution time and improve usage in a heterogeneous environment. On the other hand, the high memory requirements to train a DNN model make running the training processes on GPUs onerous. To contend with this, Zhang et al9 introduce two execution optimization methods based on pipeline-hybrid parallelism in a GPU cluster with heterogeneous networking. Sketch is a compact data structure used to summarize data streams. It is widely used in the measurement of network traffic, and its accuracy is higher than traditional methods. Currently, there are some typical sketches: Count-Min Sketch, CU Sketch, and Count Sketch. According to the characteristics of network traffic, Zhu et al10 propose a new sketch framework called Self-Adaption Sketch, which combines Sketch and Bloom Filter. In the framework, the sketch is created dynamically and the memory space is adjusted timely according to the network traffic by using the concept carrying. We thank the authors for their contributions, including those whose papers are not included in this special issue. We also would like to acknowledge thoughtful work from many reviewers who provided valuable evaluations and recommendations.