A Term Co-Occurrence Algorithm and the Effect of Co-Occurrence Terms on Result Ranking for Information Retrieval

CHEN Chong,PENG Bo,YAN Hongfei,WANG Jimin
DOI: https://doi.org/10.3321/j.issn:1000-0054.2005.09.029
2005-01-01
Abstract:Terms which co-occur with query words are hypothesized to be helpful to discriminate the source documents. The effect of co-occurrence terms on reranking the relevant documents in information retrieval systems was studied in this paper. A new algorithm—FDC frequency, term distance, co-collection ratio is proposed to extract the most significant terms co-occurring with query words in documents. The algorithm considers both single document statistics i.e., co-occurrence frequency, term distance and global statistics in the collection i.e., co-collection ratio. The performance of reranking is evaluated with discounted cumulative gain, based on the query and clickthrough logs of Tianwang search engine. Comparing the performance of FDC and the latency semantic indexing LSI method to extract co-occurrence terms, we found that retrieval performance with FDC reranking is improved from the baseline with a believability of 99%; similar result is got with LSI method. The result shows that the co-occurrence terms of query words can improve the relevance by reranking the original results, and does not depend on specific algorithms.
What problem does this paper attempt to address?