Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languages

Daniel J Wu,Andrew C Yang,Vinay U Prabhu
DOI: https://doi.org/10.48550/arXiv.2009.13509
2020-09-29
Abstract:We present Afro-MNIST, a set of synthetic MNIST-style datasets for four orthographies used in Afro-Asiatic and Niger-Congo languages: Ge`ez (Ethiopic), Vai, Osmanya, and N'Ko. These datasets serve as "drop-in" replacements for MNIST. We also describe and open-source a method for synthetic MNIST-style dataset generation from single examples of each digit. These datasets can be found at <a class="link-external link-https" href="https://github.com/Daniel-Wu/AfroMNIST" rel="external noopener nofollow">this https URL</a>. We hope that MNIST-style datasets will be developed for other numeral systems, and that these datasets vitalize machine learning education in underrepresented nations in the research community.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?