Abstract:A predicate head is a verbal expression that plays a role as the structural center of a sentence. Identifying predicate heads is critical to understanding a sentence. It plays the leading role in organizing the relevant syntactic elements in a sentence, including subject elements, adverbial elements, etc. For some languages, such as English, word morphologies are valuable for identifying predicate heads. However, Chinese offers no morphological information to indicate words` grammatical roles. A Chinese sentence often contains several verbal expressions; identifying the expression that plays the role of the predicate head is not an easy task. Furthermore, Chinese sentences are inattentive to structure and provide no delimitation between words. Therefore, identifying Chinese predicate heads involves significant challenges. In Chinese information extraction, little work has been performed in predicate head recognition. No generally accepted evaluation dataset supports work in this important area. This paper presents the first attempt to develop an annotation guideline for Chinese predicate heads and their relevant syntactic elements. This annotation guideline emphasizes the role of the predicate as the structural center of a sentence. The design of relevant syntactic element annotation also follows this principle. Many considerations are proposed to achieve this goal, e.g., patterns of predicate heads, a flattened annotation structure, and a simpler syntactic unit type. Based on the proposed annotation guideline, more than 1,500 documents were manually annotated. The corpus will be available online for public access. With this guideline and annotated corpus, our goal is to broadly impact and advance the research in the area of Chinese information extraction and to provide the research community with a critical resource that has been lacking for a long time.

The Basic Processing of Contemporary Chinese Corpus at Peking University SPECIFICATION

The Coonstruction and Utilization of A Comprehensive Language Knowledge-base

Recent Developments in Chinese Corpus Research

Hua Yu: A Word-segmented and Part-Of-Speech Tagged Chinese Corpus

New Progress of the Grammatical Knowledge-base of Contemporary Chinese

Building Chinese Sense Annotated Corpus with the Help of Software Tools

Annotating the Contemporary Chinese Corpus

The Comprehensive Language Knowledge Base and Its Prospect

BUPT Systems in the SIGHAN Bakeoff 2007.

On Construction of a Chinese Corpus Bused on Semantic Dependency Relations

A Corpus-Based Study on Semantic and Cognitive Features of Bei Sentences in Mandarin Chinese

Annotation of Chinese Predicate Heads and Relevant Elements

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

Blending segmentation with tagging in Chinese language corpus processing

Lexical Issues in Chinese Information Processing:in the Background of Sentence-based Diagram Treebank Construction

Quality Assurance Of Automatic Annotation Of Very Large Corpora: A Study Based On Heterogeneous Tagging Systems

Exploiting Lexicalized Statistical Patterns in Chinese Linguistic Analysis

Chinese word segmentation at Peking University

Semantic Relations Hierarchy and Knowledge Base Construction of Chinese Basic Noun Compounds

Study of Word-Based Chinese Document Experimental System and Chinese Free-Text Information Extraction Experiment Based on It

Research on Deep Processing Technologies for Large-Scale Corpora