Abstract:As a shining pearl in traditional Tibetan culture, historical Tibetan documents have received extensive attention from historians, linguists and Buddhist scholars. These documents are converted into digital form using Tibetan document segmentation and recognition methods. The document digitization is of great significance for the research, protection and inheritance of Tibetan history. This paper proposes an overall segmentation and recognition framework for historical Tibetan document images. Firstly, the historical Tibetan document image is preprocessed to correct imbalanced illumination, tilt and noises, and is further transformed into the binarized image. Secondly, we propose a layout segmentation method based on block projection to segment Tibetan document images into texts, lines and frames. Thirdly, in order to solve the problems of touching strokes between text-lines and curvilinear text-lines, we present a text-line segmentation method based on graph model for historical Tibetan text-line segmentation. Lastly, we present a touching segmentation method to segment touching Tibetan character string, and then recognize Tibetan characters. Experimental results show our proposed methods on layout segmentation, text-line segmentation and touching character string segmentation, achieve the satisfactory performance. The proposed methods can also be applied to other fonts in Tibetan font family.

Word spotting application in historical mongolian document images

A Keyword Retrieval System for Historical Mongolian Document Images

Feature Selection in Word Spotting Technology for Retrieving Historical Mongolian Document Images

Indexing for Mongolian Kanjur images in word spotting

A Case Study of Bovw for Keyword Spotting on Historical Mongolian Document Images

Character Segmentation for Classical Mongolian Words in Historical Documents.

LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

A multiple instances approach to improving keyword spotting on historical Mongolian document images

A Method for Removing Inflectional Suffixes in Word Spotting of Mongolian Kanjur

HENet: Hyperbolic-Based Encoder-Decoder Network for Word Spotting in Historical Mongolian Documents

Deep Features Representation of Word Image for Keyword Spotting in Historical Mongolian Document Images

A Knowledge-Based Recognition System for Historical Mongolian Documents

Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images

Classical Mongolian Words Recognition in Historical Document

Word Spotting in Chinese Document Images Without Layout Analysis

Theoretical Framework of Mongolian Word Segmentation Specification for Information Processing

An Improved Word Spotting Method for Printed Uyghur Document Image Retrieval

Word Searching in Document Images Using Word Portion Matching

Segmentation and Recognition for Historical Tibetan Document Images

Keyword Extraction Based on Statistical Information for Cyrillic Mongolian Script

Sub-Word Based Mongolian Offline Handwriting Recognition