Fast Recurrent Neural Network with Bi-LSTM for Handwritten Tamil text segmentation in NLP

C. Vinotheni,Lakshmana Pandian S.
DOI: https://doi.org/10.1145/3643808
IF: 1.471
2024-02-07
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract:Tamil text segmentation is a long-standing test in language comprehension that entails separating a record into adjacent pieces based on its semantic design. Each segment is important in its own way. The segments are organised according to the purpose of the content examination as text groups, sentences, phrases, words, characters or any other data unit. That process has been portioned using rapid tangled neural organisation in this research, which presents content segmentation methods based on deep learning in natural language processing (NLP). This study proposes a bidirectional long short-term memory (Bi-LSTM) neural network prototype in which fast recurrent neural network (FRNN) are used to learn Tamil text group embedding and phrases are fragmented using text-oriented data. As a result, this prototype is capable of handling variable measured setting data and gives a vast new dataset for naturally segmenting text in Tamil. In addition, we develop a segmentation prototype and show how well it sums up to unnoticeable regular content using this dataset as a base. With Bi-LSTM, the segmentation precision of FRNN is superior to that of other segmentation approaches; however, it is still inferior to that of certain other techniques. Every content is scaled to the required size in the proposed framework, which is immediately accessible for the preparation. This means, each word in a scaled Tamil text is employed to prepare neural organisation as fragmented content. The results reveal that the proposed framework produces high rates of segmentation for manually authored material that are nearly equivalent to segmentation-based plans.
computer science, artificial intelligence
What problem does this paper attempt to address?