Abstract:In this paper, the author firstly gives a brief overview of the history of developing Chinese corpora in mainland of China, especially focusing on some representative research projects in the last decade, such as the General Contermporary Chinese Corpus that is sponsored by the State Language Commission of China National Ministry of Education, and the Chinese Corpus of Situated Discourse in Beijing Area that is built up by China Academy of Social Science, and so on. And then the related works in this field made by Peking University on designing, annotating and using of corpus are elaborated. There are four parts are discussed in detail, including (1) a very large scale of wide time-span Chinese corpus using for linguistic research with an on-line KWIC concordance based on Web-Lucene search engine, (2) People Daily corpus which is processed with word segmentation and part-of-speech tagging, (3) a Chinese Treebank. Based on the Treebank, Chinese phrasal constructing rules can be extracted automatically, and the distribution of all kinds of phrases can be described through statistical approach. (4) a Chinese-English parallel corpus based on which a workbench prototype has been built to support Chinese-English lexicography. In the latter part of this paper, the author discusses briefly some issues, which have received more attention in this field recently, including the standardization of Chinese corpora encoding and the approaches to share large-scale Chinese corpora for researches and public use.

On Construction of a Chinese Corpus Bused on Semantic Dependency Relations

Building a large Chinese corpus annotated with semantic dependency

The Construction of a Semantic Network of Chinese Lexical Associations Based on Large-Scale Corpora

Chinese Semantic Dependency Relation System and Treebank Construction.

Chinese Statistical Parser Based on Semantic Dependencies

A chinese corpus with word sense annotation

Construction of a Chinese Semantic Dictionary by Integrating Two Heterogeneous Dictionaries: TongYiCi Cilin and HowNet

Automatically Building Large-Scale Named Entity Recognition Corpora from Chinese Wikipedia

Research on Bilingual Dependency Relationship Mapping for Chinese-English Lexicon Construction

Construction of Semantic Collocation Bank Based on Semantic Dependency Parsing.

A Study On Construction Of Modern Chinese Semantic Corpus

The Automatic Construction of Lexical Semantic Relationship Graph Based on HowNet

Building an Ellipsis-aware Chinese Dependency Treebank for Web Text

Recent Developments in Chinese Corpus Research

Constructing a WordNet-Based Multilingual Lexical-semantic Net:A Semi-automatic Method

Building a situation-based language knowledge base

Improving Chinese Dependency Parsing with Lexical Semantic Features

Semantic Relations Hierarchy and Knowledge Base Construction of Chinese Basic Noun Compounds

Constructing of a large-scale Chinese-English parallel corpus

Chinese-English Parallel Corpus Construction And Its Application

A Chinese Dependency Syntax for Treebanking