Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models

Eren Unlu,Unver Ciftci

2023-10-28

Abstract:Large Language Models (LLMs) are evolving to integrate multiple modalities, such as text, image, and audio into a unified linguistic space. We envision a future direction based on this framework where conceptual entities defined in sequences of text can also be imagined as modalities. Such a formulation has the potential to overcome the cognitive and computational limitations of current models. Several illustrative examples of such potential implicit modalities are given. Along with vast promises of the hypothesized structure, expected challenges are discussed as well.

Machine Learning

What problem does this paper attempt to address?

The paper explores the future development direction of large language models (LLMs), specifically how to integrate multiple modalities (such as text, images, and audio) into a unified language space. The paper proposes a new research direction, defining conceptual entities as text within sequences, which can also be viewed as a modality. This formalized approach is expected to overcome the cognitive and computational limitations of current models. The paper discusses two recently proposed model architectures, comparing their advantages and limitations, and evaluates their potential as the foundation for future full-modality architectures. Additionally, the paper introduces the concept of "entity embedding," where "entities" refer to any information fragments that can be represented as a finite number of interleaved tokens. The potential for recursive and interconnected utilization of entity embeddings is also discussed, along with some hypothetical application scenarios. In summary, the paper aims to explore how to enhance the capabilities of LLMs by treating various entities as modalities and presents the technical challenges and prospects for achieving this goal.

Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models

Leveraging Large Language Models for Entity Matching

What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue

Do Large Language Model Understand Multi-Intent Spoken Language ?

E5-V: Universal Embeddings with Multimodal Large Language Models

Entity-Aware Multimodal Alignment Framework for News Image Captioning

Does Conceptual Representation Require Embodiment? Insights From Large Language Models

Multimodal Large Language Models: A Survey

Generalizable Entity Grounding via Assistance of Large Language Model

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Embodied human language models vs. Large Language Models, or why Artificial Intelligence cannot explain the modal be able to

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

The Revolution of Multimodal Large Language Models: A Survey

Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects

Explaining Multi-modal Large Language Models by Analyzing their Vision Perception

PaLM-E: An Embodied Multimodal Language Model

Empowering MultiModal Models' In-Context Learning Ability through Large Language Models.

Learning Cross-Context Entity Representations from Text

ModaVerse: Efficiently Transforming Modalities with LLMs