Abstract:PurposeRecent trends have shown the integration of Chinese word segmentation (CWS) and part-of-speech (POS) tagging to enhance syntactic and semantic parsing. However, the potential utility of hierarchical and structural information in these tasks remains underexplored. This study aims to leverage multiple external knowledge sources (e.g. syntactic and semantic features, lexicons) through various modules for the joint task.Design/methodology/approachWe introduce a novel learning framework for the joint CWS and POS tagging task, utilizing graph convolutional networks (GCNs) to encode syntactic structure and semantic features. The framework also incorporates a pre-defined lexicon through a lexicon attention module. We evaluate our model on a range of public corpora, including CTB5, PKU and UD, the novel ZX dataset and the comprehensive CTB9 dataset.FindingsExperimental results on these benchmark corpora demonstrate the effectiveness of our model in improving the performance of the joint task. Notably, we find that syntax information significantly enhances performance, while lexicon information helps mitigate the issue of out-of-vocabulary (OOV) words.Originality/valueThis study introduces a comprehensive approach to the joint CWS and POS tagging task by combining multiple features. Moreover, the proposed framework offers potential adaptability to other sequence labeling tasks, such as named entity recognition (NER).

The Uncertainty-based Retrieval Framework for Ancient Chinese CWS and POS

Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Distant Supervision

TURNER: The Uncertainty-based Retrieval Framework for Chinese NER

Unified Framework of Performing Chinese Word Segmentation and Part-Of-Speech Tagging

A Pragmatic Approach for Classical Chinese Word Segmentation.

A cross-temporal contrastive disentangled model for ancient Chinese understanding

A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora.

Unsupervised Chinese Word Segmentation with BERT Oriented Probing and Transformation

Domain-Aware Word Segmentation for Chinese Language: A Document-Level Context-Aware Model

Unsupervised segmentation of chinese corpus using accessor variety

Word Segmentation for Classical Chinese Buddhist Literature

Incorporating Knowledge for Joint Chinese Word Segmentation and Part-of-speech Tagging with SynSemGCN

A Study in Dictionary-Based All-word Word Sense Disambiguation for Pre-Qin Chinese

Automatic Corpus Expansion for Chinese Word Segmentation by Exploiting the Redundancy of Web Information.

Time-Aware Ancient Chinese Text Translation and Inference

Parsing-based Chinese word segmentation integrating morphological and syntactic information

A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing

That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory

TopWORDS-Seg: Simultaneous Text Segmentation and Word Discovery for Open-Domain Chinese Texts via Bayesian Inference

A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM.

Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text