MIRACLE's Approach to Multilingual Web Retrieval

Á. Martínez-González,J. Martínez-Fernández,César de Pablo-Sánchez,Julio Villena-Román,Luis Jiménez-Cuadrado,Paloma Martínez,José Carlos González
Abstract:For MIRACLE participation on WebClef 2005, a set of independent indexes was constructed for each top level domain of the EuroGOV collection. Each of these indexes contains information extracted from the document, like URL, title, keywords, detected named entities or HTML headers. These indexes are queried to obtain partial document rankings, which are combined with various relative weights to test the value of each index. The trie based indexing and retrieval engine developed by the MIRACLE team is now fully functional and has been adapted to the WebClef environment and employed in this campaign. Other tools, such as the Named Entities Recognizer based on a finite automaton, have also been developed.
What problem does this paper attempt to address?