Abstract:Singlish can be interesting to the computational linguistics community both linguistically, as a major low-resource creole based on English, and computationally, for information extraction and sentiment analysis of regional social media. In our conference paper, Wang et al. (2017), we investigated part-of-speech (POS) tagging and dependency parsing for Singlish by constructing a treebank under the Universal Dependencies scheme and successfully used neural stacking models to integrate English syntactic knowledge for boosting Singlish POS tagging and dependency parsing, achieving the state-of-the-art accuracies of 89.50% and 84.47% for Singlish POS tagging and dependency, respectively. In this work, we substantially extend Wang et al. (2017) by enlarging the Singlish treebank to more than triple the size and with much more diversity in topics, as well as further exploring neural multi-task models for integrating English syntactic knowledge. Results show that the enlarged treebank has achieved significant relative error reduction of 45.8% and 15.5% on the base model, 27% and 10% on the neural multi-task model, and 21% and 15% on the neural stacking model for POS tagging and dependency parsing, respectively. Moreover, the state-of-the-art Singlish POS tagging and dependency parsing accuracies have been improved to 91.16% and 85.57%, respectively. We make our treebanks and models available for further research.

Towards Accurate and Efficient Chinese Part-of-Speech Tagging.

Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging.

Incorporating External POS Tagger for Punctuation Restoration

Unified Framework of Performing Chinese Word Segmentation and Part-Of-Speech Tagging

Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?

Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features

A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora.

Experimental Study of Hidden Markov Model Based Part-of-speech Tagging for Chinese Texts

A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging.

Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning.

Capturing Long-distance Dependencies in Sequence Models: A Case Study of Chinese Part-of-speech Tagging.

Combining Context Features by Canonical Belief Network for Chinese Part-Of-Speech Tagging.

Chinese Function Tag Labeling.

Exploiting limited data for parsing

Parsing-based Chinese word segmentation integrating morphological and syntactic information

Introducing more features to improve Chinese shift-reduce parsing

A Chinese Part-of-speech Tagging Approach Using Conditional Random Fields

Exploiting Heterogeneous Treebanks for Parsing.

Quality Assurance Of Automatic Annotation Of Very Large Corpora: A Study Based On Heterogeneous Tagging Systems

From Genesis to Creole Language