Abstract:Multi-modal Large Language Models (MLLMs) have exhibited impressive capability. However, recently many deficiencies of MLLMs have been found compared to human intelligence, $\textit{e.g.}$, hallucination. To drive the MLLMs study, the community dedicated efforts to building larger benchmarks with complex tasks. In this paper, we propose benchmarking an essential but usually overlooked intelligence: $\textbf{association}$, a human's basic capability to link observation and prior practice memory. To comprehensively investigate MLLM's performance on the association, we formulate the association task and devise a standard benchmark based on adjective and verb semantic concepts. Instead of costly data annotation and curation, we propose a convenient $\textbf{annotation-free}$ construction method transforming the general dataset for our association tasks. Simultaneously, we devise a rigorous data refinement process to eliminate confusion in the raw dataset. Building on this database, we establish three levels of association tasks: single-step, synchronous, and asynchronous associations. Moreover, we conduct a comprehensive investigation into the MLLMs' zero-shot association capabilities, addressing multiple dimensions, including three distinct memory strategies, both open-source and closed-source MLLMs, cutting-edge Mixture-of-Experts (MoE) models, and the involvement of human experts. Our systematic investigation shows that current open-source MLLMs consistently exhibit poor capability in our association tasks, even the currently state-of-the-art GPT-4V(vision) also has a significant gap compared to humans. We believe our benchmark would pave the way for future MLLM studies. $\textit{Our data and code are available at:}$ <a class="link-external link-https" href="https://mvig-rhos.com/llm_inception" rel="external noopener nofollow">this https URL</a>.

The "LLM World of Words" English free association norms generated by large language models

Can large language models help augment English psycholinguistic datasets?

Language models and psychological sciences

Cognitive Bias in Decision-Making with LLMs

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models

The African Woman is Rhythmic and Soulful: An Investigation of Implicit Biases in LLM Open-ended Text Generation

A Survey on Human-Centric LLMs

Understanding and Mitigating Language Confusion in LLMs

Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks

Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas

LLM-Human Pipeline for Cultural Context Grounding of Conversations

The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead?

A Comprehensive Evaluation of Cognitive Biases in LLMs

Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings

Under the Surface: Tracking the Artifactuality of LLM-Generated Data

Do LLMs exhibit human-like response biases? A case study in survey design

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

Cognitive phantoms in LLMs through the lens of latent variables

Investigating Context Effects in Similarity Judgements in Large Language Models

"Im not Racist but...": Discovering Bias in the Internal Knowledge of Large Language Models