Abstract:Abstract Nowadays, with the rapid growth of data volume, massive data has become one of the factors that plague the development of enterprises. How to effectively process data and reduce the concurrency pressure of data access has become the driving force for the continuous development of big data solutions. This article mainly studies the MapReduce parallel computing framework based on multiple data fusion sensors and GPU clusters. This experimental environment uses a Hadoop fully distributed cluster environment, and the entire programming of the single-source shortest path algorithm based on MapReduce is implemented in Java language. 8 ordinary physical machines are used to build a fully distributed cluster, and the configuration environment of each node is basically the same. The MapReduce framework divides the request job into several mapping tasks and assigns them to different computing nodes. After the mapping process, a certain intermediate file that is consistent with the final file format is generated. At this time, the system will generate several reduction tasks and distribute these files to different cluster nodes for execution. This experiment will verify the changes in the running time of the PSON algorithm when the size of the test data set gradually increases while keeping the hardware level and software configuration of the Hadoop platform unchanged. When the number of computing nodes increases from 2 to 4, the running time is significantly reduced. When the number of computing nodes continues to increase, the reduction in running time will become less and less significant. The results show that NESTOR can complete the basic workflow of MapReduce, and simplifies the process of user development of GPU positive tree order, which has a significant speedup for applications with large amounts of calculations.

GCMR: A GPU Cluster-Based MapReduce Framework for Large-Scale Data Processing

A MapReduce Computing Framework Based on GPU Cluster.

A Research of MapReduce with GPU Acceleration

A-MapCG: an Adaptive MapReduce Framework for GPUs.

GPU Computations on Hadoop Clusters for Massive Data Processing

Accelerate MapReduce on GPUs with multi-level reduction

An Implementation of GPU Accelerated MapReduce: Using Hadoop with OpenCL for Data- and Compute-Intensive Jobs

Accelerating Support Vector Machine Learning With Gpu-Based Mapreduce

Implementation of MapReduce parallel computing framework based on multi-data fusion sensors and GPU cluster

Providing Source Code Level Portability Between Cpu and Gpu with Mapcg

GAMMA: A Graph Pattern Mining Framework for Large Graphs on GPU.

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU

A Programming Framework Based on Multi-GPU

An Efficient Grouped Virtual Mapreduce Cluster

A Novel Multi-CPU/GPU Collaborative Computing Framework for SGD-based Matrix Factorization

GMH: A Message Passing Toolkit for GPU Clusters

Parallel Data Mining on CUDA-enabled Graphics Processing Unit(GPU)

CuMF_SGD: Fast and Scalable Matrix Factorization.