Feature selection and transduction for prediction of molecular bioactivity for drug design

Jason Weston, Fernando Perez-Cruz, Olivier Bousquet, Olivier Chapelle, Andre Elisseeff, Bernhard Schölkopf
2003-04-12
Abstract:Motivation: In drug discovery a key task is to identify characteristics that separate active (binding) compounds from inactive (non-binding) ones. An automated prediction system can help reduce resources necessary to carry out this task. Results: Two methods for prediction of molecular bioactivity for drug design are introduced and shown to perform well in a data set previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001. The data is characterized by very few positive examples, a very large number of features (describing three-dimensional properties of the molecules) and rather different distributions between training and test data. Two techniques are introduced specifically to tackle these problems: a feature selection method for unbalanced data and a classifier which adapts to the distribution of the the unlabeled test data (a so-called transductive method). We show both …
What problem does this paper attempt to address?