Abstract:Foundation models have had a big impact in recent years and billions of dollars are being invested in them in the current AI boom. The more popular ones, such as Chat-GPT, are trained on large amounts of Internet data. However, it is becoming apparent that this data is likely to be exhausted soon, and technology companies are looking for new sources of data to train the next generation of foundation models. Reinforcement learning, RAG, prompt engineering and cognitive modelling are often used to fine-tune and augment the behaviour of foundation models. These techniques have been used to replicate people, such as Caryn Marjorie. These chatbots are not based on people's actual emotional and physiological responses to their environment, so they are, at best, a surface-level approximation to the characters they are imitating. To address these issues, we have developed a recording rig that captures what the wearer is seeing and hearing as well as their skin conductance (GSR), facial expression and brain state (14 channel EEG). AI algorithms are used to process this data into a rich picture of the environment and internal states of the subject. Foundation models trained on this data could replicate human behaviour much more accurately than the personality models that have been developed so far. This type of model has many potential applications, including recommendation, personal assistance, GAN systems, dating and recruitment. This paper gives some background to this work and describes the recording rig and preliminary tests of its functionality. It then suggests how a new type of foundation model could be created from the data captured by the rig and outlines some applications. Data gathering and model training are expensive, so we are currently working on the launch of a start-up that could raise funds for the next stage of the project.

What problem does this paper attempt to address?

The paper aims to address two main issues faced by current Foundation Models: 1. **Limitations of Data Sources**: The large language models widely used today (such as Chat-GPT) are primarily trained on vast amounts of text data from the internet. However, as this type of data is increasingly being fully exploited, researchers and tech companies are beginning to realize that this data source may soon be exhausted. Therefore, there is a need to find new data sources to train the next generation of foundation models. 2. **Superficiality of Personality Simulation**: Existing technologies (such as reinforcement learning, retrieval-augmented generation [RAG], and prompt engineering) can enable foundation models to mimic the conversational style of specific individuals. However, these imitations are not based on the real emotional and physiological responses of the individuals, thus only achieving a superficial level of approximation. To address the above issues, the research team developed a new recording device (First-person Recorder) that can capture what the wearer sees and hears, as well as their emotional and physiological responses to the environment. Specifically, the device includes components such as a camera, microphone, galvanic skin response (GSR) sensor, facial expression recognition, and electroencephalogram (EEG) to collect information on the wearer's external stimuli and internal states. By using AI algorithms to process this data, a new type of foundation model—First-Person Foundation Model (FPFM)—can be constructed. This model can more accurately replicate human behavior, thus having broad application potential in areas such as personal assistants, generative adversarial networks (GANs), dating matching, recruitment, and more. In short, the core goal of the paper is to overcome the challenges faced by existing foundation models by introducing a new data collection method and demonstrating how to use this data to train a new generation of foundation models that are more precise and closer to actual human behavior.

Recording First-person Experiences to Build a New Type of Foundation Model

A New Type of Foundation Model Based on Recordings of People's Emotions and Physiology

APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment

Foundation Models in Robotics: Applications, Challenges, and the Future

Foundation Reinforcement Learning: Towards Embodied Generalist Agents with Foundation Prior Assistance

Data Portraits: Recording Foundation Model Training Data

Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models

Artificial intelligence foundation and pre-trained models: Fundamentals, applications, opportunities, and social impacts

Robot Learning in the Era of Foundation Models: A Survey

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Foundation Models for Decision Making: Problems, Methods, and Opportunities

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

Foundation model of neural activity predicts response to new stimulus types and anatomy

RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms

Human-like Affective Cognition in Foundation Models

Large-scale Training of Foundation Models for Wearable Biosignals

Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning

Can foundation models actively gather information in interactive environments to test hypotheses?

A Phenomenological AI Foundation Model for Physical Signals

Computer Audition: From Task-Specific Machine Learning to Foundation Models