Abstract:This work presents a generic computer vision system designed for exploiting trained deep Convolutional Neural Networks (CNN) as a generic feature extractor and mixing these features with more traditional hand-crafted features. Such a system is a single structure that can be used for synthesizing a large number of different image classification tasks. Three substructures are proposed for creating the generic computer vision system starting from handcrafted and non-handcrafter features: i)one that remaps the output layer of a trained CNN to classify a different problem using an SVM; ii) a second for exploiting the output of the penultimate layer of a trained CNN as a feature vector to feed an SVM; and iii) a third for merging the output of some deep layers, applying a dimensionality reduction method, and using these features as the input to an SVM. The application of feature transform techniques to reduce the dimensionality of feature sets coming from the deep layers represents one of the main contributions of this paper. Three approaches are used for the non-handcrafted features: deep transfer learning features based on convolutional neural networks (CNN), principal component analysis network (PCAN), and the compact binary descriptor (CBD). For the handcrafted features, a wide variety of state-of-the-art algorithms are considered: Local Ternary Patterns, Local Phase Quantization, Rotation Invariant Co-occurrence Local Binary Patterns, Completed Local Binary Patterns, Rotated local binary pattern image, Globally Rotation Invariant Multi-scale Co-occurrence Local Binary Pattern, and several others. The computer vision system based on the proposed approach was tested on many different datasets, demonstrating the generalizability of the proposed approach thanks to the strong performance recorded. The Wilcoxon signed rank test is used to compare the different methods; moreover, the independence of the different methods is studied using the Q-statistic. To facilitate replication of our experiments, the MATLAB source code will be available at (https://www.dropbox.com/s/bguw035yrqz0pwp/ElencoCode.docx?dl=0).

Hand-Crafted Features or Machine Learnt Features? Together They Improve RGB-D Object Recognition

Improving RGB-D Face Recognition via Transfer Learning from a Pretrained 2D Network.

Learning Feature Embedding with Strong Neural Activations for Fine-Grained Retrieval

Discriminatively Learning for Representing Local Image Features with Quadruplet Model

MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition.

Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition

Performance evaluation of deep feature learning for RGB-D image/video classification

Multimodal deep learning for robust RGB-D object recognition

Handcrafted Local Features are Convolutional Neural Networks

RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

Semi-supervised Learning for RGB-D Object Recognition.

Unsupervised Feature Learning For Rgb-D Image Classification

From handcrafted to deep local features

Multi-feature Joint Sparse Representation for RGB-D Object Recognition

Handcrafted vs. non-handcrafted features for computer vision classification

Recurrent Convolutional Fusion for RGB-D Object Recognition

Robust Multiview Feature Learning for RGB-D Image Understanding

RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration

An Enhanced Deep Feature Representation for Person Re-identification

From Local Binary Patterns to Pixel Difference Networks for Efficient Visual Representation Learning

RGB×D: Learning Depth-Weighted RGB Patches for RGB-D Indoor Semantic Segmentation