TREC 2003 Novelty and Web Track at ICT.

Jian Sun,Zhe Yang,Wenfeng Pan,Huaping Zhang,Bin Wang,Xueqi Cheng
DOI: https://doi.org/10.6028/nist.sp.500-255.novelty-cas-ict.bin
2003-01-01
Abstract:In this paper, we will present our approaches and experiments on the following two tracks of TREC-2003: Novelty track and Web track. The novelty track can be treated as a binary classification problem: relevant sentences vs. irrelevant sentences, or new vs. non-new. In this way, we applied variants of techniques that have been employed for text categorization problem. To retrieve the relevant sentences, we compute the similarity between the topic and sentences using vector space model (VSM). If the similarity exceeds a certain threshold, the sentence is considered as relevant. In addition, we tried several techniques in an attempt to improve the performance: using narrative field and adopting dynamic threshold for different docs. We also have implemented the KNN algorithm and Winnow algorithm for classifying the sentences into relevant and irrelevant sentences in the novelty task 3. To detect the new sentences from the relevant sentences, we used Maximum Marginal Relevance (MMR) measure, Winnow algorithm and word overlapping within sentences. In addition, we attempted to detect novelty by computing semantic distance between sentences using WordNet. For the Web track, we improved the basic SMART system, and the Lnu-Ltu weighting method was introduced into the system. The improved system has been proved to be effective in last year’s task. In addition, we implemented a simple retrieval system using the probability model that is adopted by Okapi. The structure of the paper is as follows: The section 2 reports the approaches and experiments in novelty track. The section 3 describes the experiments in web track. Finally, in section 4, we conclude by summarizing our experiments and presenting the future work.
What problem does this paper attempt to address?