Abstract:A large body of research seeking to explore how form affects lexical processing in bilinguals has suggested that orthographically similar translations (e.g., English-Portuguese "paper- papel ") are responded to more quickly and accurately than words with little to no overlap (e.g., English-Portuguese "house- casa "). One of the most prominent algorithms to estimate orthographic similarity, the normalized Levenshtein distance (NLD), returns an index of the proportion of identical characters of two strings, and is an efficient and invaluable tool for the selection, manipulation, and control of verbal stimuli. Notwithstanding its many advantages for second-language research, the absence of a comparable measure for phonology has resulted in the adoption of different strategies to assess the degree of interlanguage phonological similarity across the literature, with profound implications for the interpretation of results on the relative role of orthographic and phonological similarity in bilingual lexical access. In the present work, we introduce PHOR-in-One, a multilingual lexical database with a set of phonological and orthographic NLD estimates for 6160 translation equivalents in American and British English, European Portuguese, German and Spanish in a total of 30,800 words. We also propose a new measure of phonographic NLD, a pooled index of orthographic and phonological similarity, particularly useful for researchers interested in controlling for and/or manipulating both estimates at once. PHOR-in-One includes a comprehensive characterization of its lexical entries, namely Part-of-Speech-dependent and independent frequency counts, number of letters and phonemes, and phonetic transcription. PHOR-in-One is thus a valuable tool to support bilingual and multilingual research.

Basis Identification for Automatic Creation of Pronunciation Lexicon for Proper Names

A High Accuracy Approach for Word-Phoneme Translation Using Neural Networks

Exploring Generation of Pronunciation Lexicon for Low-Resource Language Automatic Speech Recognition Based on Generic Phone Recognizer

Non-native English lexicon creation for bilingual speech synthesis

Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling

Learning Alternative Name Spellings

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

A probabilistic approach to pronunciation by analogy

Multilingual context-based pronunciation learning for Text-to-Speech

Multi-level Linguistic Knowledge Based Chinese Grapheme-to-Phoneme Conversion.

An investigation of heuristic, manual and statistical pronunciation derivation for Pashto.

Clustering of Spell Variations for Proper Nouns Transliterated from the other languages

Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech

Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

Learning Similarity Functions for Pronunciation Variations

Sideways Transliteration: How to Transliterate Multicultural Person Names?

PHOR-in-One: A multilingual lexical database with PHonological, ORthographic and PHonographic word similarity estimates in four languages

Personal Names Popularity Estimation and its Application to Record Linkage

Improving Proper Noun Recognition in End-to-End ASR By Customization of the MWER Loss Criterion