Collection of a Chinese Spontaneous Telephone Speech Corpus and Proposal of Robust Rules for Robust Natural Language Parsing

Thomas Fang Zheng,Pengju Yan,Hui Sun,Mingxing Xu,Wenuhu Wu
2002-01-01
Abstract:In this paper, a Chinese Spontaneous Telephone Speech Corpus in the flight enquiry and reservation domain (CSTSC-Flight) of 6 GB raw data containing about 50 hours' valid speech is introduced, including the collection and transcription principles and outline. Analysis on the spoken language phenomena contained in this corpus is then performed. Based on this, four types of grammatical are proposed so as to cover as many Chinese spoken language phenomena as possible for robust natural language parsing and understanding in spoken dialogue systems.
What problem does this paper attempt to address?