Abstract:Abstract Nowadays, with the rapid growth of data volume, massive data has become one of the factors that plague the development of enterprises. How to effectively process data and reduce the concurrency pressure of data access has become the driving force for the continuous development of big data solutions. This article mainly studies the MapReduce parallel computing framework based on multiple data fusion sensors and GPU clusters. This experimental environment uses a Hadoop fully distributed cluster environment, and the entire programming of the single-source shortest path algorithm based on MapReduce is implemented in Java language. 8 ordinary physical machines are used to build a fully distributed cluster, and the configuration environment of each node is basically the same. The MapReduce framework divides the request job into several mapping tasks and assigns them to different computing nodes. After the mapping process, a certain intermediate file that is consistent with the final file format is generated. At this time, the system will generate several reduction tasks and distribute these files to different cluster nodes for execution. This experiment will verify the changes in the running time of the PSON algorithm when the size of the test data set gradually increases while keeping the hardware level and software configuration of the Hadoop platform unchanged. When the number of computing nodes increases from 2 to 4, the running time is significantly reduced. When the number of computing nodes continues to increase, the reduction in running time will become less and less significant. The results show that NESTOR can complete the basic workflow of MapReduce, and simplifies the process of user development of GPU positive tree order, which has a significant speedup for applications with large amounts of calculations.

Accelerate MapReduce on GPUs with multi-level reduction

A MapReduce Computing Framework Based on GPU Cluster.

A-MapCG: an Adaptive MapReduce Framework for GPUs.

Implementation of MapReduce parallel computing framework based on multi-data fusion sensors and GPU cluster

A Programming Framework Based on Multi-GPU

Accelerating Support Vector Machine Learning With Gpu-Based Mapreduce

Optimization of GPU-Based Main-Memory Hash Join

Design and Optimization of a Big Data Computing Framework Based on CPU/GPU Cluster

GAMMA: A Graph Pattern Mining Framework for Large Graphs on GPU.

GPU-based Efficient Join Algorithms on Hadoop.

Optimizing Random Access to Hierarchically-Compressed Data on GPU

GPU Computations on Hadoop Clusters for Massive Data Processing

MapSQ: A MapReduce-based Framework for SPARQL Queries on GPU

Providing Source Code Level Portability Between Cpu and Gpu with Mapcg

Optimization for Multi-Join Queries on the GPU

Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications.

HPGA: A High-Performance Graph Analytics Framework on the GPU

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters

Lit: A High Performance Massive Data Computing Framework Based on CPU/GPU Cluster

Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs.

MapSQ: A Plugin-based MapReduce Framework for SPARQL Queries on GPU