Towards the effectiveness of Deep Convolutional Neural Network based Fast Random Forest Classifier

Mrutyunjaya Panda
DOI: https://doi.org/10.48550/arXiv.1609.08864
2016-09-28
Abstract:Deep Learning is considered to be a quite young in the area of machine learning research, found its effectiveness in dealing complex yet high dimensional dataset that includes but limited to images, text and speech etc. with multiple levels of representation and abstraction. As there are a plethora of research on these datasets by various researchers , a win over them needs lots of attention. Careful setting of Deep learning parameters is of paramount importance in order to avoid the overfitting unlike conventional methods with limited parameter settings. Deep Convolutional neural network (DCNN) with multiple layers of compositions and appropriate settings might be is an efficient machine learning method that can outperform the conventional methods in a great way. However, due to its slow adoption in learning, there are also always a chance of overfitting during feature selection process, which can be addressed by employing a regularization method called dropout. Fast Random Forest (FRF) is a powerful ensemble classifier especially when the datasets are noisy and when the number of attributes is large in comparison to the number of instances, as is the case of Bioinformatics datasets. Several publicly available Bioinformatics dataset, Handwritten digits recognition and Image segmentation dataset are considered for evaluation of the proposed approach. The excellent performance obtained by the proposed DCNN based feature selection with FRF classifier on high dimensional datasets makes it a fast and accurate classifier in comparison the state-of-the-art.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the classification efficiency and accuracy of high - dimensional and complex data sets. Specifically, the author focuses on how to use deep convolutional neural networks (DCNN) for effective feature selection in fields such as bioinformatics, handwritten digit recognition, and image segmentation, and combine the fast random forest (FRF) classifier to achieve efficient and accurate classification tasks. ### Background and Motivation of the Paper With the rapid growth of data set sizes, efficient machine - learning techniques need not only to solve the curse - of - dimensionality problem, but also be able to generate sufficient simulated collisions from the full feature space of the original data to describe relative likelihood. Traditional shallow neural networks perform poorly when dealing with complex relative likelihood functions, while deep neural networks can alleviate this problem, but they have a slow training speed and are prone to overfitting. To overcome these challenges, the author proposes a method of using DCNN with dropout regularization for feature selection and combining it with the FRF classifier. ### Main Contributions 1. **Expand the Application Range of DCNN**: Explore the application potential of DCNN in other fields besides image recognition tasks. 2. **Feature Selection Method**: Propose using DCNN as a pre - processing step to select the best feature subset. 3. **Classifier Selection**: Use the FRF classifier for the final classification task. 4. **Experimental Verification**: Verify the effectiveness of the DCNN - FRF combination method on data sets in multiple different fields, including bioinformatics, image segmentation, and handwritten digit recognition data sets. ### Experimental Design and Results The author conducted experiments in three different application areas: 1. **Bioinformatics Data Sets**: Including arrhythmia, leukemia, lymphoma, and prostate cancer data sets. 2. **Handwritten Digit Recognition Data Sets**: Including Opt - Digit, Pen - Digit, and KDD Cup Japanese Vowel data sets. 3. **Image Segmentation Data Sets**: Including Segment and Landsat satellite image data sets. Through 5 - fold cross - validation, the author compared the performance of the DCNN and the DCNN - FRF combination method under different settings. The experimental results show that the DCNN - FRF combination method achieved higher classification accuracy and faster model construction time on most data sets. ### Statistical Significance Test To further verify the effectiveness of the method, the author also conducted a two - tailed paired t - test. The statistical results show that the performance of the DCNN - FRF combination method on all data sets is statistically significant, and it reaches the highest classification accuracy on some data sets. ### Conclusions and Future Work This paper demonstrates the effectiveness of DCNN combined with the FRF classifier on high - dimensional and complex data sets, especially in the fields of bioinformatics, image segmentation, and handwritten digit recognition. Future work will focus on improving the DCNN architecture to be applied to larger, more complex, and noisier data sets to further improve classification performance.