Abstract:Abstract Nowadays, with the rapid growth of data volume, massive data has become one of the factors that plague the development of enterprises. How to effectively process data and reduce the concurrency pressure of data access has become the driving force for the continuous development of big data solutions. This article mainly studies the MapReduce parallel computing framework based on multiple data fusion sensors and GPU clusters. This experimental environment uses a Hadoop fully distributed cluster environment, and the entire programming of the single-source shortest path algorithm based on MapReduce is implemented in Java language. 8 ordinary physical machines are used to build a fully distributed cluster, and the configuration environment of each node is basically the same. The MapReduce framework divides the request job into several mapping tasks and assigns them to different computing nodes. After the mapping process, a certain intermediate file that is consistent with the final file format is generated. At this time, the system will generate several reduction tasks and distribute these files to different cluster nodes for execution. This experiment will verify the changes in the running time of the PSON algorithm when the size of the test data set gradually increases while keeping the hardware level and software configuration of the Hadoop platform unchanged. When the number of computing nodes increases from 2 to 4, the running time is significantly reduced. When the number of computing nodes continues to increase, the reduction in running time will become less and less significant. The results show that NESTOR can complete the basic workflow of MapReduce, and simplifies the process of user development of GPU positive tree order, which has a significant speedup for applications with large amounts of calculations.

GPU Computations on Hadoop Clusters for Massive Data Processing

A Research of MapReduce with GPU Acceleration

A MapReduce Computing Framework Based on GPU Cluster.

Lit: A High Performance Massive Data Computing Framework Based on CPU/GPU Cluster

Design and Optimization of a Big Data Computing Framework Based on CPU/GPU Cluster

Efficient OLAP algorithms on GPU-accelerated Hadoop clusters

GPU-based Efficient Join Algorithms on Hadoop.

High Performance Computing Via a GPU

Vhadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration

Scalable Clustering Using Graphics Processors

An Implementation of GPU Accelerated MapReduce: Using Hadoop with OpenCL for Data- and Compute-Intensive Jobs

GCMR: A GPU Cluster-Based MapReduce Framework for Large-Scale Data Processing

Accelerating Fast Fourier Transforms Using Hadoop and CUDA

Fast Clustering of Data Streams Using Graphics Processors

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters

Implementation and Evaluation of Massive Data Processing Paradigm on High Performance Computers

Applying GPU and POSIX thread technologies in massive remote sensing image data processing

Implementation of MapReduce parallel computing framework based on multi-data fusion sensors and GPU cluster

A Parallel GPU-Based Approach to Clustering Very Fast Data Streams

MapReduce Across Distributed Clusters for Data-intensive Applications

Data Partitioning Strategy of GPU Heterogeneous Clusters Based on Learning