Abstract:Parsing, the task of identifying syntactic components, e.g., noun and verb phrases, in a sentence, is one of the fundamental tasks in natural language processing. Many natural language applications such as spoken-language understanding, machine translation, and information extraction, would benefit from, or even require, high accuracy parsing as a preprocessing step. Even though most state-of-the-art statistical parsers were initially constructed for parsing in English, most of them are not language-specific, in that they do not rely on properties of the language that are specific to English. Therefore, construction of a parser in a given language becomes a matter of retraining the statistical parameters with a Treebank in the corresponding language. The development of the Chinese treebank [Xia et al. 2000] spurred the construction of parsers for Chinese. However, Chinese as a language poses some unique problems for the development of a statistical parser, the most apparent being word segmentation. Since words in written Chinese are not delimited in the same way as in Western languages, the first problem that needs to be solved before an existing statistical method can be applied to Chinese is to identify the word boundaries. This is a step that is neglected by most pre-existing Chinese parsers, which assume that the input data has already been pre-segmented. This article describes a character-based statistical parser, which gives the best performance to-date on the Chinese treebank data. We augment an existing maximum entropy parser with transformation-based learning, creating a parser that can operate at the character level. We present experiments that show that our parser achieves results that are close to those achievable under perfect word segmentation conditions.

Parsing-based Automatic Chinese Term Extraction

Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

Research on Automatic Chinese Multi-word Term Extraction Based on Integration of Web Information and Term Component

Measuring Termhood in Automatic Terminology Extraction

Parsing Named Entity As Syntactic Structure

Automatic Extraction of Domain-Specific Terms

A Survey of Term Recognition and Extraction for Domainspecific Chinese Text Information Processing

Parsing-based Chinese word segmentation integrating morphological and syntactic information

Automatic Corpus-Based Extraction of Chinese Legal Terms.

Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation

Chinese Dependency Parsing Based on Treebank

End-to-End Chinese Parsing Exploiting Lexicons

[Automatic labeling and extraction of terms in natural language processing in acupuncture clinical literature]

A maximum-entropy chinese parser augmented by transformation-based learning

Parsing TCT with a Coarse-to-fine Approach.

Open Domain Chinese Triples Hierarchical Extraction Method

Bilingual Terminology Extraction Using Multi-level Termhood

Automatic Recognition of Chinese Scientific and Technological Terms Using Integrated Linguistic Knowledge

Automatic keyphrase extraction from chinese news documents

An interactive approach to term relation extraction and term extraction

Leverage External Knowledge and Self-attention for Chinese Semantic Dependency Graph Parsing