Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models

Eren Unlu,Unver Ciftci
2023-10-28
Abstract:Large Language Models (LLMs) are evolving to integrate multiple modalities, such as text, image, and audio into a unified linguistic space. We envision a future direction based on this framework where conceptual entities defined in sequences of text can also be imagined as modalities. Such a formulation has the potential to overcome the cognitive and computational limitations of current models. Several illustrative examples of such potential implicit modalities are given. Along with vast promises of the hypothesized structure, expected challenges are discussed as well.
Machine Learning
What problem does this paper attempt to address?
The paper explores the future development direction of large language models (LLMs), specifically how to integrate multiple modalities (such as text, images, and audio) into a unified language space. The paper proposes a new research direction, defining conceptual entities as text within sequences, which can also be viewed as a modality. This formalized approach is expected to overcome the cognitive and computational limitations of current models. The paper discusses two recently proposed model architectures, comparing their advantages and limitations, and evaluates their potential as the foundation for future full-modality architectures. Additionally, the paper introduces the concept of "entity embedding," where "entities" refer to any information fragments that can be represented as a finite number of interleaved tokens. The potential for recursive and interconnected utilization of entity embeddings is also discussed, along with some hypothetical application scenarios. In summary, the paper aims to explore how to enhance the capabilities of LLMs by treating various entities as modalities and presents the technical challenges and prospects for achieving this goal.