The Coonstruction and Utilization of A Comprehensive Language Knowledge-base

俞士汶,段慧明,朱学锋,张化瑞
DOI: https://doi.org/10.3969/j.issn.1003-0077.2004.05.001
2004-01-01
Abstract:The scale and quality of the knowledge-base decides the success or failure of the natural language processing system. Institute of computational linguistics of Peking university has accumulated a series of languages-data resources that have good quality with considerable scale after 18 years of diligent work: the grammatical knowledge-base of contemporary Chinese, the large-scale POS-Tagged corpus of contemporary Chinese, Semantics Knowledge-base of Contemporary Chinese (SKCC), Chinese Concept Dictionary (CCD), a bilingual parallel corpus with different aligned units, special term bank of different disciplines, the phrase structure knowledge-base of contemporary Chinese, a corpus of ancient Chinese poems. The present research will integrate these language data resources into one unified and comprehensive language knowledge-base. While incorporating all these different resources, the gaps between them must be filled up. The comprehensive language knowledge-base being planned will provide not only friendly using interface and convenient application program interface but also various software toolssupporting knowledge mining. Therefore, the research promotes the present language data resources to develop constantly from primary products into deep processed products. It will set up diversified forms of knowledge spreading mechanism and information service mechanism to offer omni-directional and multi-level support to language information processing, traditional linguistics research and language teaching.
What problem does this paper attempt to address?