Takri touching text segmentation using statistical approach

MAGOTRA, SHIKHA
DOI: https://doi.org/10.1007/s12046-023-02150-y
2023-06-15
Sadhana - Academy Proceedings in Engineering Sciences
Abstract:The paper defines a new model for Takri touching text segmentation (T3S), which uses simpler statistical operations to locate exact segmentation column and generate faster results. An analysis of existing Indian scripts touching text segmentation techniques is also provided in the paper. The analysis inferred higher accuracies with recognition-based approaches which are computationally and time extensive and, work better with huge datasets. Also, most of these have been developed for headline Indian scripts only. The proposed T3S model provides significant segmentation accuracy using a statistical approach thus, generating faster results. Also, it works well with small datasets, which is feasible with the diverse ancient low-resource scripts of India like Takri. The model has been implemented on Takri text, a class of non-headline Indian regional scripts. Thus, it provides a benchmark algorithm for conducting further research in the field. A dataset of 1465 touching consonant pairs in printed Takri script is prepared using connected component segmentation for segmenting Takri text from the manually collected archival data in printed Takri script. The T3S technique is implemented on the dataset prepared and the results are critically analyzed with the existing Indian scripts touching text segmentation approaches, based on the accuracy achieved.
What problem does this paper attempt to address?