Abstract:Learning Japanese can enhance competitiveness in a globalized economy, and we address the problems of poor open-source Japanese language teaching, cumbersome teaching tasks, and a single teaching model. We propose a hybrid Japanese teaching aid system with multiple information fusion mapping, which can effectively improve the efficiency of Japanese teaching and reduce the tedious human teaching procedures. The system is divided into two branches of Japanese language recognition, namely, the Japanese text recognition branch and the Japanese voice sequence recognition branch. In the Japanese text recognition branch, we integrate attention mechanisms and long short-term memory networks as the basic network for Japanese character text recognition. In addition, we set up separate text feature recognition systems for Japanese computer writing and handwriting to prevent feature overlap problems. For Japanese voice sequence recognition, we used a combination of memory gating unit and encoder, based on the network still extending the structure of the deep neural network and using the residual structure connection in the gating unit to avoid the gradient disappearance problem. At the end of the system, we use a softmax layer to connect the text recognition and voice recognition networks to form a Japanese language teaching aid system. To verify the efficiency of our system, we selected the Japanese text recognition public dataset and voice recognition public dataset for experimental validation. To match the practical application of the system, we created our dataset based on the dataset standard and conducted experimental validation. To compare other Japanese recognition methods, we selected the six most representative Japanese recognition algorithms for experimental comparison. To ensure the balance of the experiments, each algorithm is trained in a separate experimental environment for modeling and tuning parameters. Experimental performance and the experimental results show that our method significantly outperforms the other methods and has better system stability.

fugashi, a Tool for Tokenizing Japanese in Python

FCToken: A Flexible Framework for Blockchain-Based Compliance Tokenization

A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives

How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese

Vaporetto: Efficient Japanese Tokenization Based on Improved Pointwise Linear Classification

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

Tokenization Preference for Human and Machine Learning Model: An Annotation Study

Hybrid Japanese Language Teaching Aid System with Multi-Source Information Fusion Mapping

Extracting linguistic speech patterns of Japanese fictional characters using subword units

miditok: A Python package for MIDI file tokenization

Token-Level Fuzzing

Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

The Foundations of Tokenization: Statistical and Computational Concerns

Pattern Based Term Extraction Using ACABIT System

Efficient Deep Processing of Japanese

FudanNLP: A Toolkit for Chinese Natural Language Processing.

Development of a computer-assisted Japanese functional expression learning system for Chinese-speaking learners

Joint tokenization, parsing, and translation

A Gamification of Japanese Dependency Parsing

Tokenization as Finite-State Transduction

Japanese Predicate Conjugation for Neural Machine Translation