Abstract:Subspace learning is an effective and widely used image feature extraction and classification technique. However, for the large-scale image recognition issue in real-world applications, many subspace learning methods often suffer from large computational burden. In order to reduce the computational time and improve the recognition performance of subspace learning technique under this situation, we introduce the idea of parallel computing which can reduce the time complexity by splitting the original task into several subtasks. We develop a parallel subspace learning framework. In this framework, we first divide the sample set into several subsets by designing two random data division strategies that are equal data division and unequal data division. These two strategies correspond to equal and unequal computational abilities of nodes under parallel computing environment. Next, we calculate projection vectors from each subset in parallel. The graph embedding technique is employed to provide a general formulation for parallel feature extraction. After combining the extracted features from all nodes, we present a unified criterion to select most distinctive features for classification. Under the developed framework, we separately propose supervised and unsupervised parallel subspace learning approaches, which are called parallel linear discriminant analysis (PLDA) and parallel locality preserving projection (PLPP). PLDA selects features with the largest Fisher scores by estimating the weighted and unweighted sample scatter, while PLPP selects features with the smallest Laplacian scores by constructing a whole affinity matrix. Theoretically, we analyze the time complexities of proposed approaches and provide the fundamental supports for applying random division strategies. In the experiments, we establish two real parallel computing environments and employ four public image and video databases as the test data. Experimental results demonstrate that the proposed approaches outperform several related supervised and unsupervised subspace learning methods, and significantly reduce the computational time.

A distributed approach for large-scale classifier training and image classification

An On-Line Learning Approach with Support Vector Dormain Classifier

Training Inter-Related Classifiers for Automatic Image Classification and Annotation.

On Distributed Deep Network for Processing Large-Scale Sets of Complex Data

A Distributed SVM Method Based on the Iterative MapReduce

A Mahout Based Image Classification Framework for Very Large Dataset

Improved classification approach for use with large-scale scene images in the Hadoop cluster environment.

Distributed Online Semi-Supervised Support Vector Machine

A Distributed and Scalable Machine Learning Approach for Big Data

Distributed Learning Strategy Based On Chips For Classification With Large-Scale Dataset

A Parallel SVM Training Algorithm on Large-Scale Classification Problems

A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification

Distributed training of multiclass conic-segmentation support vector machines on communication constrained networks

Cost-sensitive Learning of Hierarchical Tree Classifiers for Large-Scale Image Classification and Novel Category Detection

Distributed Classification for Imbalanced Big Data in Distributed Environments

A Distributed Deep Representation Learning Model for Big Image Data Classification

Large-Scale Feature Matching With Distributed And Heterogeneous Computing

Distributed multi-classification support vector machines

SVM Algorithms for Large Scale Classification Problems Based on Data Partition and Ensemble Learning

A MapReduce-Based Distributed SVM for Scalable Data Type Classification.

Supervised and Unsupervised Parallel Subspace Learning for Large-Scale Image Recognition.