THU TREC2002 Web Track Experiments

Min Zhang,Ruihua Song,Chuan Lin,Shaoping Ma,Zhe Jiang,Yijiang Jin,Yiqun Liu,Le Zhao
2002-01-01
Abstract:consistent with formers. Besides, the use of the URL and links inside the webpage were also observed. Again, results on training set are encouraging. We made an assumption that a key resource is more likely to link to multiple relevant documents. Then the out-degree of the page and the similarities of the documents the page point to were used as the two factors for key resource selection. Experimental results were quite good, showing their ability of finding key resource on one server. Two site uniting (SU) approaches have been studied to select proper pages as the representation of one server. (1) The document which has index characteristic and has a high enough similarity is reserved as key resource. (2) Documents of the same server in result list are given different reliability factor which is decaying by decreases of similarities. Both are useful for given examples (using as training set) in this year's Web track, especially the latter one. Better results were got by combing SU approach and out-degree factor mentioned above to find key resource. All the experiments we performed were run on Okapi system. There are quite a few parameters to tune, which affect the performance greatly. Therefore, we also proposed and implemented a genetic algorithm based dynamic parameter learning approach to all the tasks.
What problem does this paper attempt to address?