Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure

Yang Hou,Zhenghua Li

2024-06-06

Abstract:Revealing the syntactic structure of sentences in Chinese poses significant challenges for word-level parsers due to the absence of clear word boundaries. To facilitate a transition from word-level to character-level Chinese dependency parsing, this paper proposes modeling latent internal structures within words. In this way, each word-level dependency tree is interpreted as a forest of character-level trees. A constrained Eisner algorithm is implemented to ensure the compatibility of character-level trees, guaranteeing a single root for intra-word structures and establishing inter-word dependencies between these roots. Experiments on Chinese treebanks demonstrate the superiority of our method over both the pipeline framework and previous joint models. A detailed analysis reveals that a coarse-to-fine parsing strategy empowers the model to predict more linguistically plausible intra-word structures.

Computation and Language

What problem does this paper attempt to address?

The paper mainly addresses two key issues in Chinese dependency parsing: 1. **Challenges posed by the lack of explicit word boundaries**: Since Chinese does not have clear word boundaries, traditional dependency parsing methods usually rely on word-level treebanks, requiring text to be segmented before analysis. This approach not only adds extra complexity but also makes the analysis results susceptible to segmentation accuracy. 2. **Transition from word-level to character-level**: To overcome the above problem, researchers have attempted to shift to character-level Chinese dependency parsing. However, due to the lack of character-level Chinese treebank resources, researchers need to convert word-level trees into character-level trees. Previous methods either required manual annotation of internal word structures or used simplified rules to define pseudo-internal structures. These methods are either time-consuming or fail to accurately represent the syntactic roles of characters. To address these issues, the paper proposes a new method that models potential internal word structures for character-level Chinese dependency parsing. This method allows for the implicit representation of all possible internal structures within words and introduces a constrained Eisner algorithm to ensure compatibility between the generated character-level trees and word-level trees. Additionally, a coarse-to-fine parsing strategy is proposed to improve parsing accuracy and generate internal word structures that better conform to linguistic principles. Experimental results show that this method outperforms pipeline frameworks and previous joint models on Chinese treebanks. Further analysis reveals the importance of the proposed constraints in improving parsing performance and tree integrity, and demonstrates the distribution of predicted internal word structures, confirming that the method can effectively infer complex internal word structures.

Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure

An In-depth Study on Internal Structure of Chinese Words

Chinese Dependency Parsing Model Based on Lexical Governing Degree

Neural Character-level Dependency Parsing for Chinese.

Character-Level Dependencies in Chinese: Usefulness and Learning.

Factors influencing dependency parsing of coordinating structure

Interrelations Among Dependency Tree Widths, Heights And Sentence Lengths

A Chinese Dependency Syntax for Treebanking

High-order Joint Constituency and Dependency Parsing

Chinese Dependency Parsing Based on Treebank

Classifying Syntactic Categories in the Chinese Dependency Network.

A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing

Utilizing Dependency Language Models for Graph-Based Dependency Parsing Models

Transition-Based Parsing for Deep Dependency Structures.

Chinese Parsing Exploiting Characters.

Parsing Chinese Sentences with Grammatical Relations

Improve Discourse Dependency Parsing with Contextualized Representations

Chinese Syntactic and Typological Properties Based on Dependency Syntactic Treebanks

Data-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing

Chinese Parsing Model Based on Constraint Dependency Grammar

Chinese Dependency Parsing Based on an Improved Model of MST