A novel computer-aided diagnosis system for breast MRI based on feature selection and ensemble learning.

Wei Lu,Zhe Li,Jinghui Chu
DOI: https://doi.org/10.1016/j.compbiomed.2017.03.002
2017-01-01
Abstract:Breast cancer is a common cancer among women. With the development of modern medical science and information technology, medical imaging techniques have an increasingly important role in the early detection and diagnosis of breast cancer. In this paper, we propose an automated computer-aided diagnosis (CADx) framework for magnetic resonance imaging (MRI). The scheme consists of an ensemble of several machine learning-based techniques, including ensemble under-sampling (EUS) for imbalanced data processing, the Relief algorithm for feature selection, the subspace method for providing data diversity, and Adaboost for improving the performance of base classifiers. We extracted morphological, various texture, and Gabor features. To clarify the feature subsets' physical meaning, subspaces are built by combining morphological features with each kind of texture or Gabor feature. We tested our proposal using a manually segmented Region of Interest (ROI) data set, which contains 438 images of malignant tumors and 1898 images of normal tissues or benign tumors. Our proposal achieves an area under the ROC curve (AUC) value of 0.9617, which outperforms most other state-of-the-art breast MRI CADx systems. Compared with other methods, our proposal significantly reduces the false-positive classification rate. HighlightsThe dimensionality of the features we use is larger than most state-of-the-art methods. Various features including morphological, gabor, and several types of texture features are extracted to make a comprehensive characterization of breast masses.We select the optimal feature subset from the original feature set with Relief according to their type, which helps diminish the redundant and irrelevant features as well as take the physical meaning of features into consideration.We propose a novel ensemble learning framework based on the combination of EUS, subspace and Adaboost, which helps alleviate the data imbalance problem and improve the overall classification accuracy of the whole CADx system.
What problem does this paper attempt to address?