Abstract:The recent advances in Multimodal AI & Generative AI open doors to the possibilities of solving key challenges for Persons with Learning Disability. To assist individuals facing difficulty in visual or auditory perception, this paper designs & develops a multimodal AI agent using recent advances in the field. We aim to solve the challenge of enabling persons with Visual or Auditory Processing Disorders to learn & communicate. We do this by exploring a design that allows the transformation of information across visual and language modalities. This design can be realized with the recent advances in Generative Multimodal AI. Based on each individual's needs, the AI agent dynamically adapts the Human Computer interaction model. For instance, for a child with Visual Processing Disorder (VPD), given the child's hindered ability to make sense of information taken in through the eyes, the Multimodal AI agent transforms any visual information into auditory user interaction. In another instance, for a person with Central Auditory Processing Disorder (CAPD), given the hindrance in the individual's ability to analyze information taken in through the ears, the AI dynamically translates any speech modality into visual cues. Thus the AI agent adapts dynamically to the strengths and abilities of the individual. To enable students with VPD to learn, the design allows the student to ask questions about an image. This design is realized as a Visual Question Answering task in Vision Language Transformer models. We explore interactive multimodal conversations with Few shot Learning and In-Context Instruction Tuning of Multimodal Large Language Models to address difficulty in visual reasoning. To enable persons with CAPD to learn, the design translates audio lectures into visual cues. This visual cue consists of a combination of words using speech recognition and Large Language Models based re-phrasing to simpler words, cross-modal retrieval of images to address auditory memory challenges, and AI-generated images. To identify the strengths of each child, we also explore Multimodal embedding based Multimodal latent space arithmetic to link AI across senses. To effectively integrate the proposed design into the mainstream, we explore a universal design based inclusive approach to extend the use case to create AI assistants for assisting children with different learning styles such as visual learners or auditory learners. To enable future research on the proposed design, we explore an architecture to compose a pipeline of AI models, and to connect with external systems via plugin connectors. We implement lab scale prototypes of this design and present a demo on the project webpage at https://sites.google.com/view/multimodallearningdisability.

Toddler-Guidance Learning: Impacts of Critical Period on Multimodal AI Agents

VECA : A Toolkit for Building Virtual Environments to Train and Test Human-like Agents

Learning task-agnostic representation via toddler-inspired learning

An Autonomous Developmental Cognitive Architecture Based on Incremental Associative Neural Network with Dynamic Audiovisual Fusion

Caregiver Talk Shapes Toddler Vision: A Computational Study of Dyadic Play

Design of Generative Multimodal AI Agents to Enable Persons with Learning Disability

A Computational Model of Early Word Learning from the Infant's Point of View

Reinforcement learning-based AI assistant and VR play therapy game for children with Down syndrome bound to wheelchairs

Learning for joint attention helped by functional development

A psychology based approach for longitudinal development in cognitive robotics

2D Capsule Networks Detect Perceived Changes in Infant∼Environment Relationship Reflected in 3D Movement Dynamics

Assessing Human Interaction in Virtual Reality With Continually Learning Prediction Agents Based on Reinforcement Learning Algorithms: A Pilot Study

Embodied vision for learning object representations

Modeling Social Interaction for Baby in Simulated Environment for Developmental Robotics

Can lessons from infants solve the problems of data-greedy AI?

Developmental Curiosity and Social Interaction in Virtual Agents

GUIDE: Real-Time Human-Shaped Agents

Online Continual Learning For Interactive Instruction Following Agents

Iterative Teacher-Aware Learning

An Infant Development-inspired Approach to Robot Hand-eye Coordination

Learning Robotic Hand-eye Coordination Through a Developmental Constraint Driven Approach