Graph Based Feature Selection Investigating Boundary Region of Rough Set for Language Identification

Ghazaala Yasmin,Asit Kumar Das,Janmenjoy Nayak,Danilo Pelusi,Weiping Ding
DOI: https://doi.org/10.1016/j.eswa.2020.113575
IF: 8.5
2020-01-01
Expert Systems with Applications
Abstract:Language can be chosen to be a species where maximum information can be extracted. In the world, there are many countries, some of which are of numerous types and flavours of regions based on their languages. The challenge is to make the spoken language recognition to be automated through machine learning. The proposed language identification system extracts various features from speech of different languages and constructs a complete weighted graph with extracted features as nodes and similarity among the features as weights of the edges. Similarity values are computed using the concepts of positive region and boundary region of rough set theory and a graph based feature selection algorithm is devised to select only the minimal subset of features relevant to language identification. It is observed that, investigating the boundary region together with the positive region, more valuable information is extracted which helps in selection of more relevant features for language identification. The constructed complete weighted graph is made sparse using Gini index based sparsity measure. As a result, the graph contains only the edges whose terminal nodes are highly similar. Next, a maximal spanning tree of the graph is generated using Prim's algorithm. This tree is a basic structure that provides the maximal similarity among the nodes in the graph. Finally, score of each node is computed based on weights of the edges in the tree and a node with the high est score is selected and removed from the spanning tree. This process of selection and removal of nodes is continued until the graph becomes null. The resultant set of selected nodes is considered as the important feature subset of the audio speeches used for language identification. Experimental results show the effectiveness of the proposed rough set theory based feature selection method. The results also demonstrate the usefulness of investigation of boundary region of rough sets. (C) 2020 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?