A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entities

Gianluca Apriceno,Valentina Tamma,Tania Bailoni,Jacopo de Berardinis,Mauro Dragoni
2024-10-18
Abstract:The ability to reason with and integrate different sensory inputs is the foundation underpinning human intelligence and it is the reason for the growing interest in modelling multi-modal information within Knowledge Graphs. Multi-Modal Knowledge Graphs extend traditional Knowledge Graphs by associating an entity with its possible modal representations, including text, images, audio, and videos, all of which are used to convey the semantics of the entity. Despite the increasing attention that Multi-Modal Knowledge Graphs have received, there is a lack of consensus about the definitions and modelling of modalities, whose definition is often determined by application domains. In this paper, we propose a novel ontology design pattern that captures the separation of concerns between an entity (and the information it conveys), whose semantics can have different manifestations across different media, and its realisation in terms of a physical information entity. By introducing this abstract model, we aim to facilitate the harmonisation and integration of different existing multi-modal ontologies which is crucial for many intelligent applications across different domains spanning from medicine to digital humanities.
Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the modeling and integration of multi - modal knowledge graphs (Multi - Modal Knowledge Graphs, MMKGs). Specifically, the paper aims to solve the following three main problems: 1. **Lack of a unified definition of the concept of "modality"**: - Although multi - modal information has been widely used in many fields, the definition of "modality" is not unified in the literature. Different application fields have different understandings of modality, leading to semantic differences. For example, in some cases, modality may only refer to images or audio, while in other cases, it may include more complex representations, such as language features, etc. 2. **Existing multi - modal knowledge graphs (MMKGs) tend to model homogeneous knowledge**: - Most existing MMKGs usually only cover a single type of modal information, such as images, videos or audio, while ignoring the interactions and associations between different modalities. This homogeneity limits the expressive power and application scenarios of the model, especially in cases where cross - modal fusion is required. 3. **Poor reusability and interoperability of existing knowledge graphs**: - Since most MMKGs are designed for specific tasks, their scope and type depend on specific application scenarios, and it is difficult to directly apply them to other fields or align and integrate them with other MMKGs. This limits the wide use and interoperability of these knowledge graphs. To solve these problems, the paper proposes a new ontology design pattern (Ontology Design Pattern, ODP), which aims to provide an abstract upper - level ontology for unifying and connecting multi - modal ontologies in different fields. This pattern achieves a general definition of modality by separating the specific manifestations of modality from its semantics, and supports the co - existence and integration of different modalities. In addition, this pattern also provides flexibility and extensibility, allowing users to define new modalities and express the relationships between them. ### Core contributions of the paper 1. **Proposing a new multi - modal ontology design pattern**: - This pattern provides an abstract upper - level ontology that can unify and connect multi - modal ontologies in different fields by separating the specific manifestations of modality from its semantics. 2. **Supporting the integration and co - existence of multi - modal information**: - The pattern allows connections to be established between different modalities of the same entity, thereby achieving more comprehensive and context - aware entity modeling. 3. **Promoting semantic consistency in modality definitions**: - By clearly distinguishing the nature of modality and its content, the pattern promotes semantic consistency in modality definitions and reduces ambiguity between different applications. 4. **Improving the reusability and interoperability of multi - modal knowledge graphs**: - The design of the pattern makes it applicable to various fields and tasks, thereby improving the reusability and interoperability of multi - modal knowledge graphs. Through these contributions, the paper provides a solid foundation for the research and application of multi - modal knowledge graphs, and promotes the progress of artificial intelligence systems in understanding and processing multi - modal information.