Ameliorating multimodal food classification using state of the art deep learning techniques

Avantika Saklani,Shailendra Tiwari,H. S. Pannu
DOI: https://doi.org/10.1007/s11042-023-17850-0
IF: 2.577
2024-01-05
Multimedia Tools and Applications
Abstract:Food categorization is the new era of research due to its growing benefits in the health and medical fields. Undoubtedly, automated food recognition tools will aid in the creation of systems for tracking diets, estimating calories, and more in the future. Today in a world where data availability is witnessing an exorbitant rate of increase, multimodal data fusion has provided a way ahead by combining information from more than one modality. In this paper, we employ multimodal fusion of visual with its associated textual features for performing the classification of food data. Related researches in the field of food classification included various deep learning methods for extracting visual and the textual features. In this research work we have further utilised the deep learning methods for the feature extraction and tried to improve the multimodal food classification by combining features from both the modalities. The overall process was broken down to two steps first to extract the features from image and the related text individually and secondly to combine them for final classification. The features from images are extracted using fine tuned Inception V4 and from the text, features are extracted using Roberta while keeping the layers trainablity to true. The proposed research work is validated on the UPMC food dataset and the newly created Bharatiya Food dataset by us. The results shows that the proposed multimodal framework outperforms several state of the art methods.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?