Improved Relation-based Information Retrieval Technology

Computer Science
Abstract:One of the limitations with the traditional relationship-based IR methods is that a relation is often recorded as a binary form,such as R(First Term,Second Term),which is only composed of general information of a pair of two terms which are semantically and syntactically related to each other.To tackle this problem,we explore an improved technique by using of triples in information retrieval for precision-focused biomedical literature search.In this paper,a triple is defined as a data structure for the integration of a pair of concepts as well as a verb phrase or sometimes a special noun we extract from the sentence as the relation of the above concepts pair,and stores relation and concepts information.Unlike the traditional relationship-based model,our model represents a document or a query by a set of triples,such as R(relation)[First Concept,Second Concept].Since some semantic and syntactic exceptions occur in documents and queries,the different types of triple should be permitted,e.g.a query:What does the mad cow disease come from? has a triple:R(come from)[First Concept(mad cow disease),Unknown].Therefore,we can get the answer of the unknown thing in query if some documents have the matching triples in the index.Of course,we will apply the advanced ontology-based approach to extract generic concepts and their relations by using both UMLS and WordNet,and we have implemented a new approach to rank retrieved passages from same or different documents corresponding to measuring system performance protocol in TREC 2007 Genomics Track.A new version(we called it IRIRS) of the relation-based IR system which has been developed by DM Bioinformatics Lab of Drexel University in 2004(we called it RIRS),is then built for the improved relation-based search in the area of biomedical literature IR and DM.We use IRIRS to improve the retrieval result of tests of English reading comprehension.The experiment shows promising performance of relation-based IR.Mean average passage precision(MAPP),the character-based MA Pmeasuring passage-level retrieval performance,for 64 topics is significantly raised from 64.44 %(the result of RIRS) to 74.28%.Furthermore,the experiment shows more expressiveness of relation and triple structure for the representation of information needs,especially in the area of biomedical literature.
What problem does this paper attempt to address?