Semi-supervised Feature Analysis by Mining Correlations among Multiple Tasks

Xiaojun Chang,Yi Yang
DOI: https://doi.org/10.1109/TNNLS.2016.2582746
2015-01-12
Abstract:In this paper, we propose a novel semi-supervised feature selection framework by mining correlations among multiple tasks and apply it to different multimedia applications. Instead of independently computing the importance of features for each task, our algorithm leverages shared knowledge from multiple related tasks, thus, improving the performance of feature selection. Note that we build our algorithm on assumption that different tasks share common structures. The proposed algorithm selects features in a batch mode, by which the correlations between different features are taken into consideration. Besides, considering the fact that labeling a large amount of training data in real world is both time-consuming and tedious, we adopt manifold learning which exploits both labeled and unlabeled training data for feature space analysis. Since the objective function is non-smooth and difficult to solve, we propose an iterative algorithm with fast convergence. Extensive experiments on different applications demonstrate that our algorithm outperforms other state-of-the-art feature selection algorithms.
Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve how to utilize the correlations among multiple related tasks and the advantages of semi - supervised learning to enhance the effect of feature selection during the feature selection process. Specifically, the author proposes a novel semi - supervised feature selection framework (Semi - supervised Feature selection by Mining Correlations among multiple tasks, SFMC), which performs feature selection by mining the correlations among multiple tasks and combining labeled and unlabeled data. #### Main problems include: 1. **Redundant features in high - dimensional data**: - In many computer vision and pattern recognition applications, the dimension of data representation is usually very high. Many features are noisy or correlated with each other, which will reduce the performance of subsequent data analysis tasks. 2. **Information loss in feature selection**: - Existing feature selection algorithms usually independently evaluate the importance of each feature, ignoring the correlations between different features. Moreover, they select features for each task separately and fail to mine the correlations among multiple related tasks. 3. **Insufficient labeled data**: - In real - world applications, it is unrealistic to manually label a large number of training samples. Therefore, how to effectively utilize unlabeled data becomes an important issue. 4. **The advantages of multi - task learning are not fully utilized**: - Existing research on multi - task learning shows that jointly learning multiple related tasks can improve performance. However, existing feature selection algorithms fail to fully utilize this. ### Solutions To overcome the above problems, the author proposes the following solutions: - **Combining semi - supervised learning and multi - task learning**: - Use labeled and unlabeled data for feature selection and consider the correlations between different features, thereby improving feature selection performance. - **Introducing manifold learning**: - Explore the structure of multimedia data through manifold learning to better handle feature space analysis. - **Optimizing the objective function**: - Propose a fast - converging iterative algorithm to solve the non - smooth and difficult - to - solve objective function, so as to obtain the optimal solution. ### Experimental verification The author verifies the effectiveness of the proposed method through experiments such as video classification, image annotation, human action recognition, and 3D motion data analysis. The experimental results show that the SFMC algorithm outperforms other existing methods in different application scenarios, especially when the labeled data is insufficient. ### Summary The main contribution of this paper lies in combining semi - supervised feature selection and multi - task learning in one framework, which not only improves the effect of feature selection but also can better handle the problem of insufficient labeled data.