Abstract:In the construction of resources for Korean phonetic information processing, automatic phonetic transcription technology plays a crucial role. However, because Korean word formation is extremely powerful and new words emerge regularly, it is impossible to create a database that contains all of the words. As a result, in addition to the words in the database, how to solve the pronunciation annotation of those unregistered words outside the database, referred to as OOV (out of vocabulary) words, has turned into a problem that must be resolved in the process of Korean natural language processing. The current academic approaches to grapheme-to-phoneme (G2P) conversion techniques have been commonly knowledge-based or data-based. Previously, the methods which were only knowledge-driven based are difficult to adapt to the actual situation of a large amount of data information. The data-driven approach relies solely on high-quality data, makes it difficult to reasonably determine the input variables, and necessitates the use of adequate and precisely selected model features. To address these issues, the paper proposes a knowledge-driven and data-driven fusion based an automatic Korean language G2P method. Firstly, we extract eigenvalues based on the pronunciation rules and the phonetic changing rules between words in Korean, and feed them in to the model for training. And then, the model is trained to achieve the automatic phonetic transcription for Korean using the data-driven model, which can better fit the mapping relationship between input and output variables. The proposed model can reflect the phonological changes in the Korean continuous speech stream, and can accurately obtain the phonemes corresponding to the graphemes. The method has been cross-validated for validity and superiority to improve model performance, and the average accuracy on grapheme-to-phoneme conversion can reach 94.63%.

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion

r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation

Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models

R-G2p: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled Noise Introducing and Contextual Information Incorporation

Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion

Multilingual context-based pronunciation learning for Text-to-Speech

Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects

Integration of Knowledge-Driven and Data-Driven Based Korean Phonetic Transcription

Near-Optimal Active Learning for Multilingual Grapheme-to-Phoneme Conversion

Integrating prior knowledge and data-driven approaches for improving grapheme-to-phoneme conversion in Korean language

Transformer based Grapheme-to-Phoneme Conversion

Neural Grapheme-To-Phoneme Conversion with Pre-Trained Grapheme Models

LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

DNN-based Speech Synthesis for Indian Languages from ASCII text

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

LSTM Acoustic Models Learn to Align and Pronounce with Graphemes

Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion