From Speech to Text in Chinese: A Computer-Aided Transcription System for the Legal Domain.

Benjamin Ka-Yin T'sou,King Kui Sin,Samuel W. K. Chan,Tom B. Y. Lai,Lawrence Y. L. Cheung,K. T. Ko,Gary K. K. Chan
DOI: https://doi.org/10.5555/2835865.2835907
2000-01-01
Abstract:Following the reversion of sovereignty from Britain to China in 1997, newly introduced legal bilingualism in Hong Kong has brought on an urgent need to create a Computer-Aided Transcription (CAT) system for Chinese. The production and retention of verbatim records of court proceedings is vital for the retention of the Common Law system. The existing monolingual English CAT has to be adapted in order to produce the legally tenable court proceedings in Cantonese, the predominant Chinese dialect in Hong Kong. There are two major challenges in the design of a Chinese CAT system. First, linguistic differences in phonology and orthography mandate the adoption of a new conversion mechanism of stenograph code for Chinese. The key issue lies in the resolution of ambiguity arising from problematical homonymy in the Chinese language. With the support of a 0.85 million-character corpus, the bigram statistical model has been adopted to compute the most likely Chinese character string for each sequence of stenograph code input. Second, to ensure compatibility of Chinese and English stenographies, the Chinese system has to retain as much as possible the user interface of the existing English stenography (e.g. operational procedure and stenograph keyboard layout). These changes in the underlying conversion mechanism are made transparent to the stenographers. Our prototype Chinese CAT system can now achieve over 95% transcription accuracy.
What problem does this paper attempt to address?