Abstract:Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to cull through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.

Classement de résultats de recherche grâce au calcul d'une distance d'édition et à l'extraction d'informations documentaires

Ameliorating Search Results Recommendation System Based on K-Means Clustering Algorithm and Distance Measurements

Exploiting Community Feedback for Information Retrieval in Dht Networks

An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

Représentation de données et métadonnées dans une bibliothèque virtuelle pour une adéquation avec l'usager et les outils de glanage ou moissonnage scientifique

Enrichissement des contenus par la réindexation des usagers : un état de l'art sur la problématique

Ranking Archived Documents for Structured Queries on Semantic Layers

Improving web search results using affinity graph.

Le travail collaboratif dans le cadre d'un projet architectural

On-Line Selection Of Distinguishing Elements For Focused Information Retrieval

Learning to rank relational objects and its application to web search.

An Efficient Information Extraction Mechanism with Page Ranking and a Classification Strategy based on Similarity Learning of Web Text Documents

Information Retrieval in long documents: Word clustering approach for improving Semantics

A New Method to Query Document Database by Content and Structure

An architecture for non-linear discovery of aggregated multimedia document web search results

Effectively Searching Maps in Web Documents

Distributed Architecture for Large Scale Image-Based Search

A Method to Query Document Database by Content and Structure

Algorithme de recherche approximative dans un dictionnaire fondé sur une distance d'édition définie par blocs

Combinaison d'information visuelle, conceptuelle, et contextuelle pour la construction automatique de hierarchies semantiques adaptees a l'annotation d'images

PCCS：A FAST CLUSTERING AND CLASSIFICATION METHOD FOR WEB DOCUMENT