Abstract:Background: The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (i.e. their evidence codes). Results: We present here a new semantic similarity measure called IntelliGO which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The IntelliGO similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO biological process and molecular function terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the IntelliGO similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the IntelliGO similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to previously published measures. Conclusions: The IntelliGO similarity measure provides a customizable and comprehensive method for quantifying gene similarity based on GO annotations. It also displays a robust set-discriminating power which suggests it will be useful for functional clustering. Availability: An on-line version of the IntelliGO similarity measure is available at: http://bioinfo.loria.fr/Members/benabdsi/intelligo_project/

Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation

From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity

A new approach to measure the semantic similarity of gene annotation

Inferring Semantic Similarity Through Correlating Information Contents Of Gene Ontology Terms

IntelliGO: a new vector-based semantic similarity measure including annotation origin

A Novel Comprehensive Approach for Estimating Concept Semantic Similarity in WordNet

Semantic Similarity from Natural Language and Ontology Analysis

Tags Are Related: Measurement of Semantic Relatedness Based on Folksonomy Network

Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory

Evaluation of GO-based Functional Similarity Measures Using S. Cerevisiae Protein Interaction and Expression Profile Data

Measuring gene functional similarity based on group-wise comparison of GO terms

A Hybrid Semantic Similarity Measurement for Geospatial Entities

A Measure of Semantic Similarity Between Gene Ontology Terms Based on Semantic Pathway Covering

A semantic similarity measure based on information distance for ontology alignment

Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language

Semantic Measures for the Comparison of Units of Language, Concepts or Instances from Text and Knowledge Base Analysis

Comprehensive weighting method for calculation of ontologybased semantic similarity

simona: a comprehensive R package for semantic similarity analysis on bio-ontologies

Novel symmetry-based gene-gene dissimilarity measures utilizing Gene Ontology: Application in gene clustering

Semantic Search among Heterogeneous Biological Databases Based on Gene Ontology

Description and Evaluation of Semantic Similarity Measures Approaches