Abstract:Objective: Biomedical document conceptualization is the process of clustering biomedical documents based on ontology-represented domain knowledge. The result of this process is the representation of the biomedical documents by a set of key concepts and their relationships. Most of clustering methods cluster documents based on invariant domain knowledge. The objective of this work is to develop an effective method to cluster biomedical documents based on various user-specified ontologies, so that users can exploit the concept structures of documents more effectively. Methods: We develop a flexible framework to allow users to specify the knowledge bases, in the form of ontologies. Based on the user-specified ontologies, we develop a key concept induction algorithm, which uses latent semantic analysis to identify key concepts and cluster documents. A corpus-related ontology generation algorithm is developed to generate the concept structures of documents. Results: Based on two biomedical datasets, we evaluate the proposed method and five other clustering algorithms. The clustering results of the proposed method outperform the five other algorithms, in terms of key concept identification. With respect to the first biomedical dataset, our method has the F-measure values 0.7294 and 0.5294 based on the MeSH ontology and gene ontology (GO), respectively. With respect to the second biomedical dataset, our method has the F-measure values 0.6751 and 0.6746 based on the MeSH ontology and GO, respectively. Both results outperforms the five other algorithms in terms of F-measure. Based on the MeSH ontology and GO, the generated corpus-related ontologies show informative conceptual structures. Conclusions: The proposed method enables users to specify the domain knowledge to exploit the conceptual structures of biomedical document collections. In addition, the proposed method is able to extract the key concepts and cluster the documents with a relatively high precision. (C) 2010 Elsevier By. All rights reserved.

Adaptive Concept Resolution for document representation and its applications in text mining.

Learning ontology resolution for document representation and its applications in text mining.

Incorporating Knowledge into Neural Network for Text Representation.

Model Semantic Relations with Extended Attributes

An Ontology-based Approach to Topic-specific Web Resource Discovery

An Ontology-Based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design

Semantic Annotation with RescoredESA

Ontology enhancement and concept granularity learning: keeping yourself current and adaptive.

Cce: A Chinese Concept Encyclopedia Incorporating The Expert-Edited Chinese Concept Dictionary With Online Cyclopedias

Effective Term-Concept Mapping Method Based on Ontology

A comparative study for wordnet guided text representation

Approach for multi-dimensional associated heterogeneous engineering document semantic retrieval

Read Extensively, Focus Smartly: A Cross-document Semantic Enhancement Method for Visual Documents NER.

A knowledge-driven approach to biomedical document conceptualization

A Data-Driven Text Mining and Semantic Network Analysis for Design Information Retrieval

Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling

ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts

Concept over Time: the Combination of Probabilistic Topic Model with Wikipedia Knowledge.

An Integrated Solution for Improving Semantic Content Searching in Distributed Environment.

Extend Concepts Extendable Related Concepts Core Ontology IC Value Extraction Algorithm Insert Semantic Relationships WordNet Target Ontology

Approach To Indefinite Semantic Conflicts Of Words In Collaborative Editing Of Design Documents