Research on the Structure Recognition of Academic Texts Under Different Characteristics
wang dongbo,gao ruiqing,ye wenhao,zhou xin,zhu danhao
DOI: https://doi.org/10.3772/j.issn.1000-0135.2018.10.004
2018-01-01
Abstract:With the emergence of a large number of full-text scientific theses, the process of extracting the useful information in these volumes is not only beneficial to knowledge-based organizations but is also useful for the accurate retrieval of academic literature. The recognition of the structure of academic text is the basis for this investigation because structure recognition is helpful in the comprehension of these documents from the perspective of depth and semantic, to promote research into academic text mining. This paper examines different structural functions of academic texts as research objects, and considers 1579 papers from the Journal of the Association for Information Science and Technology as the dataset, and compares three types of models, namely bidirectional long short-term memory neural network, support vector machine, and conditional random fields, and the conditional random field determined to be used in the following exploration. Based on this approach, the problem of functional structure recognition of academic texts was transformed to identify the sequence of sentence units. Finally, the best model was obtained for an F-measure of 92.88% for the average of the open test, and the effect of different features on the structure recognition problem was explored. The experimental results showed that the lexical information in the chapter titles and the feature words in the chapters play an important role in academic text functional structure recognition, and satisfactory results were produced. However, the length of the structure affected the conditional random fields method. The causes of the errors associated with the identification of academic texts are summarized, in addition to the identification of the limitations and plans for further studies.