Incorporating multivariate semantic association graphs into multimodal networks for information extraction from documents

Shun Luo,Juan Yu,Yunjiang Xi
DOI: https://doi.org/10.1007/s11227-024-06174-x
IF: 3.3
2024-05-24
The Journal of Supercomputing
Abstract:Documents contain abundant information available for managerial decision-making. However, manual methods of screening document information lack accuracy due to the heterogeneity of documents. To address the above issue, we propose a multimodal network combining multivariate semantic association graphs, MMIE, for accurately extracting information from documents. Firstly, the multivariate semantic graphs between multimodal data within each modality are constructed based on the semantic association of text contents, followed by the semantic relationships in the graphs to lead the fusion and embedding of the extracted multimodal data and improve the feature representation capability. Subsequently, the semantically linked multimodal information is fed into the newly constructed multimodal self-attention module to better establish inter-modal associations. Finally, a supervised comparison learning loss function is employed to reduce further the information loss due to sample imbalance. The experimental results on three real datasets show that the proposed model can extract feature information of different modal data more accurately, and the F1 scores reach 87.28 , 82.53 , and 81.17 , respectively.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?