Abstract:Currently, text-to-image synthesis uses text encoder and image generator architecture. Research on this topic is challenging. This is because of the domain gap between natural language and vision. Nowadays, most research on this topic only focuses on producing a photo-realistic image, but the other domain, in this case, is the language, which is less concentrated. A lot of the current research uses English as the input text. Besides, there are many languages around the world. Bahasa Indonesia, as the official language of Indonesia, is quite popular. This language has been taught in Philipines, Australia, and Japan. Translating or recreating a new dataset into another language with good quality will cost a lot. Research on this domain is necessary because we need to examine how the image generator performs in other languages besides generating photo-realistic images. To achieve this, we translate the CUB dataset into Bahasa using google translate and manually by humans. We use Sentence BERT as the text encoder and FastGAN as the image generator. FastGAN uses lots of skip excitation modules and auto-encoder to generate an image with resolution 512x512x3, which is twice as bigger as the current state-of-the-art model (Zhang, Xu, Li, Zhang, Wang, Huang and Metaxas, 2019). We also get 4.76 +- 0.43 and 46.401 on Inception Score and Fréchet inception distance, respectively, and comparable with the current English text-to-image generation models. The mean opinion score also gives as 3.22 out of 5, which means the generated image is acceptable by humans. Link to source code: <a class="link-external link-https" href="https://github.com/share424/Indonesian-Text-to-Image-synthesis-with-Sentence-BERT-and-FastGAN" rel="external noopener nofollow">this https URL</a>

Stemmer and phonotactic rules to improve n-gram tagger-based indonesian phonemicization

Neural Grapheme-To-Phoneme Conversion with Pre-Trained Grapheme Models

Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging

Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects

Dynamic decoding and dual synthetic data for automatic correction of grammar in low-resource scenario

Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger

Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models

Implementation of The Indonesian Language Stemming Algorithm in Twitter Data Preprocessing. Case Study: Twitter Wargabanua and Instakalsel

R-G2p: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled Noise Introducing and Contextual Information Incorporation

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

Transformer based Grapheme-to-Phoneme Conversion

idT5: Indonesian Version of Multilingual T5 Transformer

End-to-end indonesian speech recognition with convolutional and gated recurrent units

Reduce Indonesian Vocabularies with an Indonesian Sub-word Separator

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

Indonesian Text-to-Image Synthesis with Sentence-BERT and FastGAN

LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled noise introducing and Contextual information incorporation

Developing an Online Self-learning System of Indonesian Pronunciation for Foreign Learners

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion