An Overview of the Advances and Applications of Online Chinese Language Resources over Three Decades

Weidong Zhan
DOI: https://doi.org/10.19689/j.cnki.cn10-1361/h.20180405
2018-01-01
Abstract:In the past three decades, empiricism paradigm in research prevails in natural language processing and other language application fields, which leads to the boom of online language data resources, including corpora, knowledge bases, and the related search engines.. With regard to Chinese language online resources, numerous Chinese corpora, lexicon and dictionaries, large or small, have been established and open for search and research purposes, which has given great impetus for Chinese language studies. This paper examines the development and application of the online Chinese language resources, and discusses their possible impact on linguistics and the challenges for their further development. First, it gives a brief introduction of the background of corpus development. Second, it presents an overview of the Chinese language resources constructed since the 1990s to date. Third, it uses some concrete examples to demonstrate the application of online resources in linguistic research and language teaching. Fourth, it discusses the challenges for the construction of Chinese language online resources and the difficulties in their applications. In conclusion, it suggests a closer integration of introspection-based theoretical analysis and data-driven statistical analysis to benefit language studies.
What problem does this paper attempt to address?