Multimodal Fusion Induced Attention Network for Industrial VOCs Detection

Yu Kang,Kehao Shi,Jifang Tan,Yang Cao,Lijun Zhao,Zhenyi Xu
DOI: https://doi.org/10.1109/tai.2024.3436037
2024-01-01
IEEE Transactions on Artificial Intelligence
Abstract:Industrial volatile organic compounds (VOCs) emissions and leakage have caused serious problems to the environment and public safety. Traditional VOCs monitoring systems require professionals to carry gas sensors into the emission area to collect VOCs, which might cause secondary hazards. VOCs infrared imaging visual inspection technology is a convenient and low-cost method. However, current visual detection methods with VOCs infrared imaging are limited due to blurred imaging and indeterminate gas shapes. Moreover, major works pay attention to only infrared modality for VOCs emissions detection, which would neglect semantic expressions of VOCs. To this end, we propose a dual-stream fusion detection framework to deal with visible and infrared features of VOCs. Additionally, a multimodal fusion induced attention module (MFIA) is designed to realize feature fusion across modalities. Specifically, MFIA uses the spatial attention fusion module (SAFM) to mine association among modalities in terms of spatial location and generates fused features by spatial location weighting. Then, the modality adapter (MA) and induced attention module (IAM) are proposed to weight latent VOCs regions in infrared features, which alleviates the problem of noise interference and degradation of VOCs characterization caused by fusion. Finally, comprehensive experiments are carried out on the challenging VOCs dataset, and the mAP@.5 and F1-score of the proposed model are 0.527 and 0.601, which outperforms the state-of-the-art methods by 3.3% and 3.4%, respectively.
What problem does this paper attempt to address?