Abstract:The Book of Songs is the earliest anthology of poetry in China: it is one of the thirteen classic books of Confucian tradition. The Book of Songs is ranked the first of the ancient canonical Five Classics. The Five Classics include Yijing("Classic of Changes"), the Shujing("Classic of History"), The Book of Songs, the Collection of Rituals, and the Chunqiu("Spring and Autumn Annals"). The connotations of The Book of Songs are abundant, reflecting all aspects of social life in the Zhou Dynasty, such as labor and love, war and corvee oppression and rebellion, customs and marriage, ancestor worship and banquets, and even astronomy, geomorphology, animals, and plants. It is a mirror of Zhou Dynasty society, known as The Life Encyclopedia of Ancient Society. Moreover, The Book of Songs is the textbook of ancient Chinese political ethics, aesthetic education, and naturalism. With the extensive application of humanities computing, this paper combines the Sinological Index Series with the domain knowledge of the Mao Shi Index, and studies the automatic word segmentation of The Book of Songs using the machine learning method. Based on the corpus of the manual word segmentation of The Book of Songs, the method of combining the Guang Yun and statistical analysis was used to get 23 sets of feature templates that fuse different characteristics knowledge and then producing machine learning segmentation model by training. The performance of each word segmentation model is analyzed, and it is found that lexical features have the greatest influence on the word segmentation effect of The Book of Songs, and the harmonic mean F value of the word segmentation model can be up to 97.42%. Finally, the paper uses the domain glossary of the Mao Shi Index to carry out the post-processing of the long word correction with the test performance optimum segmentation model, and obtains the word corpus of The Book of Songs that fuses the expert vocabulary knowledge of the Mao Shi Index. This article integrates knowledge into the multi-dimensional domain to realize the automatic segmentation of The Book of Songs, which provides reference for the related research of the Pre-Qin poetry. Moreover, it inspires the study of the automatic word segmentation of Pre-Qin Classics. The word corpus of The Book of Songs, as part of the Pre-Qin Classics word corpus, has a supporting role to further realize the knowledge mining of the Pre-Qin Classics.

When Classical Chinese Meets Machine Learning: Explaining the Relative Performances of Word and Sentence Segmentation Tasks

Classical Chinese Sentence Segmentation for Tomb Biographies of Tang Dynasty

Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example

A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM.

A Pragmatic Approach for Classical Chinese Word Segmentation.

Chinese Sentiment Analysis Exploiting Heterogeneous Segmentations.

That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory

A Study on Natural Typing Annotations for Building Corpus of Chinese Word Segmentation

Ancient Chinese Sentence Segmentation Based on Bidirectional LSTM+CRF Model

Improving Chinese Word Segmentation Using Partially Annotated Sentences

Word Segmentation for Classical Chinese Buddhist Literature

Deep Learning for Chinese Word Segmentation and POS Tagging.

Exploring Multiple Chinese Word Segmentation Results Based on Linear Model

Survey on Chinese Word Segmentation

Chinese Word Segmentation Without Using Lexicon and Hand-Crafted Training Data

Multi-Scale TextTiling for Automatic Story Segmentation in Chinese Broadcast News

Onto Word Segmentation of the Complete Tang Poems

Research on the Automatic Word Segmentation of The Book of Songs under Multi-dimensional Domain Knowledge

Capsules Based Chinese Word Segmentation for Ancient Chinese Medical Books

A Hybrid Approach to the Real World Text Segmentation

Chinese Word Segmentation Method for Domain-Special Machine Translation