Automated compound classification using a chemical ontology
Claudia Bobach,Timo Böhme,Ulf Laube,Anett Püschel,Lutz Weber
DOI: https://doi.org/10.1186/1758-2946-4-40
2012-12-01
Journal of Cheminformatics
Abstract:Abstract Background Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. Results In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. Conclusions A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated.
chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems