Abstract:As an international financial centre, Hong Kong is a metropolitan city that has given rise to multilingual characteristics in recent years. In addition to Cantonese and English, which serve mostly as first and second languages, Hong Kong residents have increasingly begun to develop a third or even a fourth language. The biliteracy and trilingualism language (兩文三語) policy encourages Mandarin as the third language. This paper introduces a corpus-based online pronunciation learning platform for Mandarin teachers, learners, and researchers to better understand the major problems encountered by Hong Kong learners of Cantonese in learning Mandarin pronunciation. A phonological corpus was established and analysed in order (a) to identify learners’ recurring difficulties in accurately and appropriately using Mandarin segmental and suprasegmental features and (b) to suggest possible solutions to reduce or eliminate such difficulties. The phonological corpus contains recorded data of four spoken tasks (reading of monosyllabic words, reading of multisyllabic words, reading of a passage, and free speech) from Hong Kong Cantonese college students. The phonological annotations of the recordings mainly focus on two areas of segmental features (vowels and consonants), two areas of suprasegmental features (tone and retroflex finals), and mispronunciation. In addition to the corpus, a pronunciation learning website was developed for learners to (a) practice segmental and suprasegmental aspects of pronunciation through a variety of perception and production exercises and (b) discover the possible causes of common Mandarin pronunciation features found in the corpus. Based on the corpus, 40 datasets were analysed, and a checklist of common Mandarin pronunciation errors made by Cantonese learners was made available for teachers and learners. The use and the evaluation of the pronunciation learning platform will also be introduced and discussed.

Jurilinguistic engineering in Cantonese Chinese: an N-gram-based speech to text transcription system

Court Stenography-To-Text ("STT") in Hong Kong: A Jurilinguistic Engineering Effort

From Speech to Text in Chinese: A Computer-Aided Transcription System for the Legal Domain.

Automatic Conversion from Phonetic to Textual Representation of Cantonese : the Case of Hong Kong Court Proceedings.

Statistically-based Model for Computer-Aided Transcription Application

Syllable based DNN-HMM Cantonese Speech to Text System

Advances in Cantonese Speech Recognition: A Language-Specific Pretraining Model and RNN-T Loss

A Preliminary Study on Deep Learning-based Chinese Text to Taiwanese Speech Synthesis System

Knowledge-based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis

An HMM-based Cantonese speech synthesis system

Computer based field investigation and processing system for languages

Designing and implementing a corpus-based online pronunciation learning platform for Cantonese learners of Mandarin

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

The BBN Mandarin Broadcast News Transcription System

Multi-level Linguistic Knowledge Based Chinese Grapheme-to-Phoneme Conversion.

Automatic Collecting of Text Data for Cantonese Language Modeling

Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin

Design and Implementation of Chinese Common Braille Translation System Integrating Braille Word Segmentation and Concatenation Rules

Acoustic inspired brain-to-sentence decoder for logosyllabic language

Grapheme-to-phoneme conversion in Chinese TTS system