CODP-1200: An AIGC based benchmark for assisting in child language acquisition

Guannan Leng,Guowei Zhang,Yu-Jie Xiong,Jue Chen
DOI: https://doi.org/10.1016/j.displa.2023.102627
IF: 3.074
2023-12-31
Displays
Abstract:AIGC (Artificial Intelligence Generated Content) is a novel AI technology that encompasses tasks such as text-to-image generation, text-to-text generation, and image-to-text generation. In the process of child language acquisition, some children may face challenges, exhibiting symptoms such as delayed language development, limited vocabulary, and poor expressive ability. To address this issue, the "Look and Speak" method can be employed, which allows children to learn and express language by observing images. In our paper, we build a dataset, named CODP-1200, benchmark for assisting in children language acquisition, which is curated and augmented using AIGC techniques. The dataset consists of 1,200 children cartoon images paired with 6,000 corresponding sentences that are used to describe them. Initially, we carefully curated and selected twelve Chinese language textbooks, ranging from the first to the sixth grade, as part of the primary compulsory education curriculum, to construct the foundational corpus. Based on the original data, two famous large language models ChatGPT and SparkDesk are employed for data augmentation, subsequently. Finally, the ERNIE-ViLG is utilized to generate children's style images corresponding to the textual descriptions. In addition, based on our proposed dataset, we propose a benchmark approach called DDMXCap, which is a diffusion-based model for image captioning, specifically from image to text. Experimental results demonstrate that our method achieves promising performance in children's image captioning tasks and provides a standardized learning process for child language acquisition. The implementation codes for our approach and build dataset are available at https://github.com/Leng-bingo/Chinese-Child-Captions .
engineering, electrical & electronic,instruments & instrumentation,optics,computer science, hardware & architecture
What problem does this paper attempt to address?
The paper aims to address issues encountered in children's language acquisition, such as delayed language development, limited vocabulary, and poor expressive ability. To tackle these problems, the research team constructed a dataset named CODP-1200, which utilizes AI-generated content (AIGC) technology to assist children's language acquisition. Specifically, the CODP-1200 dataset contains 1,200 cartoon-style images and 6,000 corresponding descriptive sentences. These images and sentences are carefully selected and generated from Chinese language textbooks for grades 1 to 6. The research team also used large language models like ChatGPT and SparkDesk for data augmentation and employed ERNIE-ViLG to generate cartoon-style images that align with children's comprehension abilities. Additionally, the paper proposes an image captioning method based on a diffusion model called DDMXCap. This method incorporates the X-Linear attention module, which can more accurately capture image features and help visually impaired children obtain more precise information from images. Experimental results show that this method performs excellently in the task of generating image captions for children, significantly improving relevant metrics. In summary, this research fills the gap in datasets specifically designed for children's image captioning and provides an effective image captioning method.