Abstract:Machine learning has been widely applied in various emerging data-intensive applications, and has to be optimized and accelerated by powerful engines to process very large scale data. Recently, the instruction set based accelerators on Field Progarmmable Gate Arrays (FPGAs) have been a promising topic for machine learning applications. The customized instructions can be further scheduled to achieve higher instruction-level parallelism. In this article, we design a ubiquitous accelerator with out-of-order automatic parallelization for large-scale data-intensive applications. The accelerator accommodates four representative applications, including clustering algorithms, deep neural networks, genome sequencing, and collaborative filtering. In order to improve the coarse-grained instruction-level parallelism, the accelerator employs an out-of-order scheduling method to enable parallel dataflow computation. We use Colored Petri Net (CPN) tools to analyze the dependences in the applications, and build a hardware prototype on the real FPGA platform. For cluster applications, the accelerator can support four different algorithms, including K-Means, SLINK, PAM, and DBSCAN. For collaborative filtering applications, it accommodates Tanimoto, euclidean, Cosine, and Pearson Correlation as Similarity metrics. For deep learning applications, we implement hardware accelerators for both training process and inference process. Finally, for genome sequencing, we design a hardware accelerator for the BWA-SW algorithm. Experimental results show that the accelerator architecture can reach up to 25X speedup against Intel processors with affordable hardware cost, insignificant power consumption, and high flexibility.

An experimental evaluation of extreme learning machines on several hardware devices

Extreme Learning Machine with Multiple Kernels

Multiple-kernel-learning-based Extreme Learning Machine for Classification Design

Fast Sparse Approximation of Extreme Learning Machine

Multi-class AdaBoost ELM

Optimizing Extreme Learning Machine Via Generalized Hebbian Learning and Intrinsic Plasticity Learning

System-on-a-Chip (SoC)-Based Hardware Acceleration for an Online Sequential Extreme Learning Machine (OS-ELM)

A Batch Inheritance Extreme Learning Machine Algorithm Based on Regular Optimization

Extreme Learning Machine Combining Hidden-Layer Feature Weighting and Batch Training for Classification

Trends in extreme learning machines

Extreme learning machine: Theory and applications

A study on effectiveness of extreme learning machine

Trends in Extreme Learning Machines: A Review

An Extreme Learning Machine Based on Artificial Immune System.

A review on extreme learning machine

Extreme learning machine: algorithm, theory and applications

Neuromorphic Extreme Learning Machines with Bimodal Memristive Synapses

A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine

Fpga Implementation of Precise Convolutional Neural Network for Extreme Learning Machine

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA

MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era