Abstract:With the emergence of a large number of full-text scientific theses, the process of extracting the useful information in these volumes is not only beneficial to knowledge-based organizations but is also useful for the accurate retrieval of academic literature. The recognition of the structure of academic text is the basis for this investigation because structure recognition is helpful in the comprehension of these documents from the perspective of depth and semantic, to promote research into academic text mining. This paper examines different structural functions of academic texts as research objects, and considers 1579 papers from the Journal of the Association for Information Science and Technology as the dataset, and compares three types of models, namely bidirectional long short-term memory neural network, support vector machine, and conditional random fields, and the conditional random field determined to be used in the following exploration. Based on this approach, the problem of functional structure recognition of academic texts was transformed to identify the sequence of sentence units. Finally, the best model was obtained for an F-measure of 92.88% for the average of the open test, and the effect of different features on the structure recognition problem was explored. The experimental results showed that the lexical information in the chapter titles and the feature words in the chapters play an important role in academic text functional structure recognition, and satisfactory results were produced. However, the length of the structure affected the conditional random fields method. The causes of the errors associated with the identification of academic texts are summarized, in addition to the identification of the limitations and plans for further studies.

Automatic Labeling of Semantic Clauses in Research Articles

Research on the Structure Recognition of Academic Texts Under Different Characteristics

Enhancing Identification of Structure Function of Academic Articles Using Contextual Information

Intelligent Segmentation Framework and Data Hierarchy of Chinese Language and Literature Based on Semantic Recognition

Semantic Analysis and Structured Language Models

Multi-documents Automatic Abstracting Based on Text Clustering and Semantic Analysis

Semantic Role Labeling Integrated with Multilevel Linguistic Cues and Bi-LSTM-CRF

Query-focused Summarisation in Research Articles Based on Semantic Function of Sentences

Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example

Semantic Component Analysis: Discovering Patterns in Short Texts Beyond Topics

Understanding the Logical and Semantic Structure of Large Documents

A Framework For Refining Text Classification and Object Recognition from Academic Articles

Learning from syntax generalizations for automatic semantic annotation

Semantic-based Automatic Text Classification Method

Automatic Labeling of Topic Models Using Text Summaries

Revealing the Importance of Semantic Retrieval for Machine Reading at Scale

Clustering articles based on semantic similarity

Clustering of Chinese Sentences Using the SMM Model

Text Classification Via Learning Semantic Dependency and Association

Automatic semantic modeling of structured data sources with cross-modal retrieval

On Conceptual Labeling of a Bag of Words.