Abstract:Mongolian is one of the most common written languages in China, Mongolia, and Russia. Many printed Mongolian documents still remain to be digitized for digital library applications. The traditional Mongolian script has a unique vertical cursive writing style and multiple font variations, which makes Mongolian Optical Character Recognition challenging. As the traditional Mongolian script has subcomponent characteristics, such that one character may be a constituent of another character, in this work we define a novel character set for recognition using segmented components. The components are combined into characters in a rule-based post-processing module. For overall character recognition, a method based on Visual Directional Features and multi-level classifiers is presented. For character segmentation, segmentation points are identified by analyzing the properties of projection profiles and connected components. Mongolian has dozens of different printed font types that can be categorized into two major groups, namely, standard and handwritten-style groups. The segmentation parameters are adjusted for each group. Additionally, script identification and relevant character recognition kernels are integrated for the recognition of Mongolian text mixed with Chinese and English. A novel multi-font printed Mongolian document recognition system based on the proposed methods is implemented. Experiments indicate a text recognition rate of 96.9% on the test samples from real documents with multiple font types and mixed script. The proposed methods can also be applied to other scripts in the Mongolian script family, such as Todo and Sibe, with significant potential for extension to historic Mongolian documents.

Grapheme Segmentation Based Mongolian Handwriting Recognition

Off-line Recognition of Realistic Chinese Handwriting Using Segmentation-Free Strategy

HMM-Based Recognizer with Segmentation-free Strategy for Unconstrained Chinese Handwritten Text

Multi-font Printed Mongolian Document Recognition System

A Novel Short Merged Off-line Handwritten Chinese Character String Segmentation Algorithm Using Hidden Markov Model

A new dataset for mongolian online handwritten recognition

International Conference on Machine Learning and Cybernetics , Hong Kong , 19-22 August 2007 HMM-BASED SYSTEM FOR TRANSCRIBING CHINESE HANDWRITING

An HMM-based Over-Segmentation Method for Touching Chinese Handwriting Recognition

Deep Convolutional Neural Network Based Hidden Markov Model for Offline Handwritten Chinese Text Recognition

Offline Mongolian Handwriting Recognition Based on Data Augmentation and Improved ECA-Net

An On- Line Free Handwritten Chinese Character Recognition Method Based on Component Cascaded HMMs

A hidden Markov model based segmentation and recognition algorithm for Chinese handwritten address character strings

Segmentation-Driven Offline Handwritten Chinese And Arabic Script Recognition

Implicit segmentation of Kannada characters in offline handwriting recognition using hidden Markov models

Hmm-Based System for Transcribing Chinese Handwriting

On-Line Handwritten English Word Recognition Based On Cascade Connection Of Character Hmms

Baseline-independent feature extraction for Arabic writing

A Comprehensive Study of Hybrid Neural Network Hidden Markov Model for Offline Handwritten Chinese Text Recognition.

Printed Arabic Character Recognition Using Hmm

A segmentation algorithm for handwritten Chinese character strings

Offline handwritten arabic character segmentation with probabilistic model