Abstract:Taking advantage of computationally lightweight, but high-quality translators prompt consideration of new applications that address neglected languages. For projects with protected or personal data, translators for less popular or low-resource languages require specific compliance checks before posting to a public translation API. In these cases, locally run translators can render reasonable, cost-effective solutions if done with an army of offline, smallscale pair translators. Like handling a specialist’s dialect, this research illustrates translating two historically interesting, but obfuscated languages: 1) hacker-speak (“l33t”) and 2) reverse (or “mirror”) writing as practiced by Leonardo da Vinci. The work generalizes a deep learning architecture to translatable variants of hacker-speak with lite, medium, and hard vocabularies. The original contribution highlights a fluent translator of hacker-speak in under 50 megabytes and demonstrates a companion text generator for augmenting future datasets with greater than a million bilingual sentence pairs. A primary motivation stems from the need to understand and archive the evolution of the international computer community, one that continuously enhances their talent for speaking openly but in hidden contexts. This training of bilingual sentences supports deep learning models using a long short-term memory, recurrent neural network (LSTM-RNN). It extends previous work demonstrating an English-to-foreign translation service built from as little as 10,000 bilingual sentence pairs. This work further solves the equivalent translation problem in twenty-six additional (non-obfuscated) languages and rank orders those models and their proficiency quantitatively with Italian as the most successful and Mandarin Chinese as the most challenging. For neglected languages, the method prototypes novel services for smaller niche translations such as Kabyle (Algerian dialect) which covers between 5-7 million speakers but one which for most enterprise translators, has not yet reached development. One anticipates the extension of this approach to other important dialects, such as translating technical (medical or legal) jargon and processing health records or handling many of the dialects collected from specialized domains (mixed languages like “Spanglish”, acronym-laden Twitter feeds, or urban slang).

Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications

Cross-Reading by Leveraging a Hybrid Index of Heterogeneous Information.

Automation Technique of Software Internationalization and Localization Based on Lexical Analysis

A survey of methods to ease the development of highly multilingual text mining applications

Random matrix ensembles of time-lagged correlation matrices: Derivation of eigenvalue spectra and analysis of financial time-series

Linguistic and multilingual issues in virtual worlds and serious games: a general review

Discovering multilingual concepts from unaligned web documents by exploring associated images

Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages

An Application of Machine Translation Technology in Multilingual Information Retrieval

An Inter-lingual Reference Approach For Multi-Lingual Ontology Matching

Local Translation Services for Neglected Languages

Mining Asymmetric Intertextuality

Multilingual Collection Retrieving Via Ontology Alignment

Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation

Exploiting a comparability mapping to improve bi-lingual data categorization: a three-mode data analysis perspective

Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

Detecting translation borrowings in huge text collections using various methods

Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles using Machine Learning

Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

Towards Using Machine Translation Techniques to Induce Multilingual Lexica of Discourse Markers

Integrating NLP Using Linked Data