Abstract:The estimation of text coherence is one of the most actual tasks of computer linguistics. Analysis of text coherence is widely used for writing and selection of documents. It allows clearly conveying the idea of an author to a reader. The importance of this task can be confirmed by the availability of actual works that are dedicated to solving it. Different automated methods for the estimation of text coherence are based on the methodology of machine learning. Corresponding methods are based on of formal text representation and following detection of regularities for the generation of an output result. The purpose of this work is to perform the analytic review of different automated methods for the estimation of text coherence; to justify method selection and adapt it due to the features of the Ukrainian language; to perform the experimental verification of the effectiveness of the suggested method for a Ukrainian corpus. In this paper, the comparative analysis of the methods for the estimation of coherence of English texts basing on a machine learning methodology has been performed. The expediency of application of methods that are based on trained universal models for the formalized representation of text components has been justified. The following models using neural networks with different architecture can be considered: recurrent and convolutional networks. These types of networks are widely used for text processing because they allow processing input data with an unfixed structure like sentences or words. Despite the ability of recurrent neural networks to take into account previous data (this behavior is similar to text perception by the reader), the convolutional neural network for conducting experimental research has been chosen. Such choice has been made due to the ability of convolutional neural networks to detect relations between entities regardless of the distance between them. In this paper, the principle of the method basing on the convolutional neural network and the corresponding architecture has been described. Program application for the verification of the suggested method effectiveness has been created. Formalized representation of text elements has been performed using a previously trained model for the semantic representation of words; the training process of this model has been implemented on the corpus of Ukrainian scientific abstracts. The training of the formed networks using pre-trained model has been performed. Experimental verification of method effectiveness for solving of document discrimination task and insert task has been made on the set of scientific articles. The results obtained may indicate that the method using convolutional neural networks can be used for further estimation of coherence of Ukrainian texts.

Comparative Analysis of N-gram Text Representation on Igbo Text Document Similarity

Analysis and representation of Igbo text document for a text-based system

An adaptive method for text domain similarity calculation

Igbo-English Machine Translation: An Evaluation Benchmark

Comparison study of using semantic and syntactic network characteristics to do text clustering

A Comparison of Document Similarity Algorithms

Measurement of Text Similarity: A Survey

A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents

Text Similarity Measures in News Articles by Vector Space Model Using NLP

The Comparative study of Python Libraries for Natural Language Processing (NLP)

A Novel Discrimination Structure for Assessing Text Semantic Similarity

A survey on the techniques, applications, and performance of short text semantic similarity

Comparative Analysis of Libraries for the Sentimental Analysis

Summarization of Odia Text Document Using Cosine Similarity and Clustering

A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports

A Survey of State-of-the-Art Short Text Matching Algorithms.

The Hybrid of Jaro-Winkler and Rabin-Karp Algorithm in Detecting Indonesian Text Similarity

An assessment of orthographic similarity measures for several African languages

A hybrid approach of Poisson distribution LDA with deep Siamese Bi-LSTM and GRU model for semantic similarity prediction for text data

The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment

METHOD FOR COHERECE EVALUATION OF UKRAINIAN TEXTS USING CONVO-LUTIONAL NEURAL NETWORK