Abstract:A great number of documents are scanned and archived in the form of digital images in digital libraries, to make them available and accessible in the Internet. Information retrieval in these imaged documents has become a growing and challenging problem. For this purpose, a word image coding technique is proposed in this paper and a web-based system for efficiently retrieving imaged documents from digital libraries is described. Some image preprocessing is first carried out off-line to extract word objects from imaged documents stored in the digital library. Then each word object is represented by a string of feature codes. As a result, each document image is represented by a series of feature code strings of its words, which are stored in a feature code file. Upon receiving a user's request, the server converts the query word into feature code string using the same conversion mechanism as is used in producing feature codes for the underlying imaged documents. Searching is then performed among those feature code files generated off line. An inexact string matching technique, with the ability of matching a word portion, is applied to match the query word with the words in the documents, and then the occurrence frequency of the query word in each corresponding document is calculated for relevant ranking. Preliminary experimental results with some imaged documents of students' theses in the digital library of our university show that the proposed approach is efficient and promising for retrieving imaged documents, with potential applications to digital libraries.

Information Retrieval in Document Image Databases

Approach to matching partial word image and its application to document image retrieval

Word Searching in Document Images Using Word Portion Matching

Document Image Retrieval Based on Multi-Density Features

Retrieving Imaged Documents In Digital Libraries Based On Word Image Coding

Document retrieval from compressed images

A Web-based System for Retrieving Document Images from Digital Library

A Retrieval Mechanism for Complex Similarity Queries in Image Databases

Document Image Retrieval with Local Feature Sequences

Modeling Image Data for Effective Indexing and Retrieval in Large General Image Databases.

Modeling Local Word Spatial Configurations for Near Duplicate Document Image Retrieval

Document Images Retrieval Based on Multiple Features Combination

A Hierarchical Algorithm for Document-Images Fast Matching

A Chinese Document Image Retrieval Method by Keywords

Towards Mobile Document Image Retrieval for Digital Library

Document Image Retrieval Based on Density Distribution Feature and Key Block Feature

Improving Perceptual Matching in Color Image Retrieval

A Stepwise Similarity Approximation of Spatial Constraints for Image Retrieval

A General Approach to Indexing and Retrieval of Images in Image Databases

Bi-Directional Image-Text Retrieval with Position Attention and Similarity Filtering

Information Retrieval beyond the Text Document