Abstract:Words provide a useful source of information for Chinese NLP, and word segmentation has been taken as a pre-processing step for most downstream tasks. For many NLP tasks, however, word segmentation can introduce noise and lead to error propagation. The rise of neural representation learning models allows sentence-level semantic information to be collected from characters directly. As a result, it is an empirical question whether a fully character-based model should be used instead of first performing word segmentation. We investigate a neural representation that simultaneously encodes character and word information without the need for segmentation. In particular, candidate words are found in a sentence by matching with a pre-defined lexicon. A lattice structured LSTM is used to encode the resulting word-character lattice, where gate vectors are used to control information flow through words, so that the more useful words can be automatically identified by end-to-end training. We compare the performance of the resulting lattice LSTM and baseline sequence LSTM structures over both character sequences and automatically segmented word sequences. Results on NER show that the character-word lattice model can significantly improve the performance. In addition, as a general sentence representation architecture, character-word lattice LSTM can also be used for learning contextualized representations. To this end, we compare lattice LSTM structure with its sequential LSTM counterpart, namely ELMo. Results show that our lattice version of ELMo gives better language modeling performances. On Chinese POS-tagging, chunking and syntactic parsing tasks, the resulting contextualized Chinese embeddings also give better performance than ELMo trained on the same data.

Character-Level Syntax Infusion in Pre-Trained Models for Chinese Semantic Role Labeling.

Semantic Role Labeling Integrated with Multilevel Linguistic Cues and Bi-LSTM-CRF

An MRC Framework for Semantic Role Labeling

Syntax-Enhanced Self-Attention-Based Semantic Role Labeling

Syntax-aware Neural Semantic Role Labeling

A Unified Syntax-aware Framework for Semantic Role Labeling

A Syntax-aware Multi-task Learning Framework for Chinese Semantic Role Labeling

Chinese Semantic Role Labeling Based on Conditional Random Fields

Syntax Aware LSTM Model for Chinese Semantic Role Labeling

Semantic Role Labeling with Heterogeneous Syntactic Knowledge

Syntax Role for Neural Semantic Role Labeling

A Full End-to-End Semantic Role Labeler, Syntax-agnostic over Syntax-aware?

Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data

Adaptive Convolution for Semantic Role Labeling

Chinese Semantic Role Labeling with Shallow Parsing.

Chinese Semantic Role Labeling with Bidirectional Recurrent Neural Networks

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

Syntax-aware Multilingual Semantic Role Labeling

An Improving SRL Model with Word Sense Information Using an Improved Synergetic Neural Network Model.

Lattice LSTM for Chinese Sentence Representation

Sub-Character Tokenization for Chinese Pretrained Language Models