Abstract:With the emergence of a large number of full-text scientific theses, the process of extracting the useful information in these volumes is not only beneficial to knowledge-based organizations but is also useful for the accurate retrieval of academic literature. The recognition of the structure of academic text is the basis for this investigation because structure recognition is helpful in the comprehension of these documents from the perspective of depth and semantic, to promote research into academic text mining. This paper examines different structural functions of academic texts as research objects, and considers 1579 papers from the Journal of the Association for Information Science and Technology as the dataset, and compares three types of models, namely bidirectional long short-term memory neural network, support vector machine, and conditional random fields, and the conditional random field determined to be used in the following exploration. Based on this approach, the problem of functional structure recognition of academic texts was transformed to identify the sequence of sentence units. Finally, the best model was obtained for an F-measure of 92.88% for the average of the open test, and the effect of different features on the structure recognition problem was explored. The experimental results showed that the lexical information in the chapter titles and the feature words in the chapters play an important role in academic text functional structure recognition, and satisfactory results were produced. However, the length of the structure affected the conditional random fields method. The causes of the errors associated with the identification of academic texts are summarized, in addition to the identification of the limitations and plans for further studies.

Automatic Identification of Subjects for Textual Documents in Digital Libraries

Functional Structure Identification of Scientific Documents in Computer Science.

An automatic approach for efficient text segmentation

A METHOD OF HIERARCHICAL DOCUMENT AUTOMATIC CLASSIFICATION IN E-RESEARCH

Chinese Documents Categorization Based on N-gram Information

Topic Detection Technology for Chinese Text Based on Statistics and Semantic Information

Multi-documents Automatic Abstracting Based on Text Clustering and Semantic Analysis

Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words

Influence of Part-of-Speech on Chinese and English Document Clustering

Identification of Chinese Personal Names in Unrestricted Texts

Text Detection In Born-Digital Images By Mass Estimation

Chinese Document Categorization without Dictionary Support and Segmentation Processing

Machine Identification of High Impact Research through Text and Image Analysis

Research on the Structure Recognition of Academic Texts Under Different Characteristics

A Classification Framework of Identifying Major Documents with Search Engine Suggestions and Unsupervised Subtopic Clustering

Automatic extraction of titles from general documents using machine learning

Automatic Multi-Document Summarization for Digital Libraries

Classifying document types to enhance search and recommendations in digital libraries

Multi-document Chinese Name Disambiguation Based on Latent Semantic Analysis

A computational model implementing subjectivity with the 'Room Theory'. The case of detecting Emotion from Text

Aripiprazole in the acute treatment of male patients with schizophrenia: effectiveness, acceptability, and risks in the inner-city hospital setting