Identification of a Preferred Set of Molecular Descriptors for Compound Classification Based on Principal Component Analysis

Ling Xue,Jeff Godden,Hua Gao,Jürgen Bajorath
DOI: https://doi.org/10.1021/ci980231d
1999-05-18
Journal of Chemical Information and Computer Sciences
Abstract:An algorithm based on principal component analysis was investigated to classify molecules in a database consisting of 455 compounds with activities against seven different biological targets. Diversity profiles of these compound sets were calculated and compared. To effectively classify compounds with similar biological activity, all possible combinations of 17 molecular descriptors were tested by complete factorial analysis, and preferred descriptor combinations were identified. High efficiency was achieved for a combination of a limited set of structural keys and two or three additional 2D descriptors. The performance of the approach was compared to Jarvis−Patrick clustering.
What problem does this paper attempt to address?