Enhanced fish species classification using dynamic multilayer perceptron and transformer encoders with extra distribution data

Mei-Hsin Chen,Ting-Hsuan Lai,Yao-Chung Chen,Tien-Yin Chou
DOI: https://doi.org/10.1007/s11042-024-20359-9
IF: 2.577
2024-10-30
Multimedia Tools and Applications
Abstract:This study introduces an innovative integrative framework for fine-grained fish species classification, significantly enhancing recognition accuracy by leveraging multimodal features. We combine Transformer Encoder and Dynamic Multilayer Perceptron to fuse image features with geospatial information. Utilizing data from the Taiwan Fish Database, our method not only utilizes photographs of fish and their capture locations but also incorporates non-photo extra fish distribution survey data. Initially, using only photographs, the model achieved an accuracy of 0.7433. By adding geospatial data, accuracy increased to 0.7731, marking a 3.93% improvement over the baseline. The integration of additional fish distribution data further boosted accuracy to 0.8000, an overall enhancement of 6.2% compared to the baseline. This approach underscores the potential of combining photo-included geospatial information with extra distribution data in fine-grained image classification tasks, thereby making a significant contribution to both scientific research and practical applications in the field.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?