Abstract:As more and more documents, especially historical handwritten documents, are converted into digitized version for long-term preservation, the demands for efficient information retrieval techniques in such document images are increasing. The objective of this research is to establish an effective representation model for handwriting, especially historical manuscripts. The proposed model is supposed to help the navigation in historical document collections. Specifically speaking, we developed our handwriting representation model with regards to word spotting application. As a specific pattern recognition task, handwritten word spotting faces many challenges such as the high intra-writer and inter-writer variability. Nowadays, it has been admitted that OCR techniques are unsuccessful in handwritten offline documents, especially historical ones. Therefore, the particular characterization and comparison methods dedicated to handwritten word spotting are strongly required. In this work, we explore several techniques that allow the retrieval of singlestyle handwritten document images with query image. The proposed representation model contains two facets of handwriting, morphology and topology. Based on the skeleton of handwriting, graphs are constructed with the structural points as the vertexes and the strokes as the edges. By signing the Shape Context descriptor as the label of vertex, the contextual information of handwriting is also integrated. Moreover, we develop a coarse-to-fine system for the large-scale handwritten word spotting using our representation model. In the coarse selection, graph embedding is adapted with consideration of simple and fast computation. With selected regions of interest, in the fine selection, a specific similarity measure based on graph edit distance is designed. Regarding the importance of the order of handwriting, dynamic time warping assignment with block merging is added. The experimental results using benchmark handwriting datasets demonstrate the power of the proposed representation model and the efficiency of the developed word spotting approach. The main contribution of this work is the proposed graph-based representation model, which realizes a comprehensive description of handwriting, especially historical script. Our structure-based model captures the essential characteristics of handwriting without redundancy, and meanwhile is robust to the intra-variation of handwriting and specific noises. With additional experiments, we have also proved the potential of the proposed representation model in other symbol recognition applications, such as handwritten musical and architectural classification

Word Spotting in Cursive Handwritten Documents using Modified Character Shape Codes

Word Searching in Scene Image and Video Frame in Multi-Script Scenario using Dynamic Shape Coding

Historical Handwriting Representation Model Dedicated to Word Spotting Application

Fast Keyword Spotting in Handwritten Chinese Documents Using Index

Keyword spotting in degraded document using mixed OCR and word shape coding

Word Spotting in Chinese Document Images Without Layout Analysis

Keyword Spotting Simplified: A Segmentation-Free Approach using Character Counting and CTC re-scoring

Handwritten-word spotting using biologically inspired features

Chinese Word Searching in Imaged Documents.

SpottingNet: Learning the Similarity of Word Images with Convolutional Neural Network for Word Spotting in Handwritten Historical Documents

Advanced Digital Image Processing Technique based Optical Character Recognition of Scanned Document

Text recognition in both ancient and cartographic documents

Categorizing ancient documents

Word Searching in Document Images Using Word Portion Matching

Character Spotting Using Machine Learning Techniques

A Method for Segmentation of Cursive Handwritings and Its Application to Character Shape Extraction

Text Segmentation in Degraded Historical Document Images

Character Keypoint-based Homography Estimation in Scanned Documents for Efficient Information Extraction

Keyword searching in compressed document images

Retrieving Imaged Documents In Digital Libraries Based On Word Image Coding

Chinese Calligraphy Word Spotting Using Elastic HOG Feature and Derivative Dynamic Time Warping