Abstract:Background Chinese word segmentation (CWS) and part-of-speech (POS) tagging are two fundamental tasks of Chinese text processing. They are usually preliminary steps for lots of Chinese natural language processing (NLP) tasks. There have been a large number of studies on CWS and POS tagging in various domains, however, few studies have been proposed for CWS and POS tagging in the clinical domain as it is not easy to determine granularity of words. Methods In this paper, we investigated CWS and POS tagging for Chinese clinical text at a fine-granularity level, and manually annotated a corpus. On the corpus, we compared two state-of-the-art methods, i.e., conditional random fields (CRF) and bidirectional long short-term memory (BiLSTM) with a CRF layer. In order to validate the plausibility of the fine-grained annotation, we further investigated the effect of CWS and POS tagging on Chinese clinical named entity recognition (NER) on another independent corpus. Results When only CWS was considered, CRF achieved higher precision, recall and F-measure than BiLSTM-CRF. When both CWS and POS tagging were considered, CRF also gained an advantage over BiLSTM. CRF outperformed BiLSTM-CRF by 0.14% in F-measure on CWS and by 0.34% in F-measure on POS tagging. The CWS information brought a greatest improvement of 0.34% in F-measure, while the CWS&POS information brought a greatest improvement of 0.74% in F-measure. Conclusions Our proposed fine-grained CWS and POS tagging corpus is reliable and meaningful as the output of the CWS and POS tagging systems developed on this corpus improved the performance of a Chinese clinical NER system on another independent corpus.

Lexicon-based Semi-Crf for Chinese Clinical Text Word Segmentation

CRF with Locality-Consistent Dictionary Learning for Semantic Segmentation

Joint CRF and Locality-Consistent Dictionary Learning for Semantic Segmentation.

A CRF-based Method for Automatic Construction of Chinese Symptom Lexicon

Chinese Word Segmentation Via BiLSTM+Semi-CRF with Relay Node

A BiLSTM-CRF Based Approach to Word Segmentation in Chinese

CRFs Based Chinese Word Segmentation

A Fine-Grained Chinese Word Segmentation and Part-of-speech Tagging Corpus for Clinical Text

A Hybrid Approach to Chinese Word Segmentation around CRFs

Chinese Word Segmentation in Flectronic Medical Record Text via Graph Neural Network-Bidirectional LSTM-CRF Model

Dictionary Chinese Word Segmentation Research a Method Combined with CRFs

Word Segmentation on Micro-blog Texts with External Lexicon and Heterogeneous Data.

CRFs-based Chinese word segmentation method with character position probability feature

Chinese lexical analysis based on hidden semi-crf

A Lexicon-Corpus-Based Unsupervised Chinese Word Segmentation Approach

A morphology-based Chinese word segmentation method

Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation.

Bidirectional LSTM-CRF Attention-based Model for Chinese Word Segmentation

Chinese Word Segmentation based on Word boundary Classificatioin

Parsing-based Chinese word segmentation integrating morphological and syntactic information

Incorporate Web Search Technology to Solve Out-of-Vocabulary Words in Chinese Word Segmentation.