Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark

Gaochang Wu,Yapeng Zhang,Lan Deng,Jingxin Zhang,Tianyou Chai
2024-06-13
Abstract:Fused Magnesium Furnace (FMF) is a crucial industrial equipment in the production of magnesia, and anomaly detection plays a pivotal role in ensuring its efficient, stable, and secure operation. Existing anomaly detection methods primarily focus on analyzing dominant anomalies using the process variables (such as arc current) or constructing neural networks based on abnormal visual features, while overlooking the intrinsic correlation of cross-modal information. This paper proposes a cross-modal Transformer (dubbed FmFormer), designed to facilitate anomaly detection in fused magnesium smelting processes by exploring the correlation between visual features (video) and process variables (current). Our approach introduces a novel tokenization paradigm to effectively bridge the substantial dimensionality gap between the 3D video modality and the 1D current modality in a multiscale manner, enabling a hierarchical reconstruction of pixel-level anomaly detection. Subsequently, the FmFormer leverages self-attention to learn internal features within each modality and bidirectional cross-attention to capture correlations across modalities. To validate the effectiveness of the proposed method, we also present a pioneering cross-modal benchmark of the fused magnesium smelting process, featuring synchronously acquired video and current data for over 2.2 million samples. Leveraging cross-modal learning, the proposed FmFormer achieves state-of-the-art performance in detecting anomalies, particularly under extreme interferences such as current fluctuations and visual occlusion caused by heavy water mist. The presented methodology and benchmark may be applicable to other industrial applications with some amendments. The benchmark will be released at <a class="link-external link-https" href="https://github.com/GaochangWu/FMF-Benchmark" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of anomaly detection in the molten magnesium smelting process. Specifically: 1. **Problem Background**: The molten magnesium furnace (FMF) is a key piece of equipment for producing magnesium oxide, and anomaly detection is crucial for ensuring its efficient, stable, and safe operation. Existing anomaly detection methods mainly focus on utilizing process variables (such as arc current) or constructing neural networks based on abnormal visual features, while neglecting the intrinsic correlation between cross-modal information. 2. **Proposed Method**: The paper proposes a cross-modal Transformer (named FmFormer) aimed at promoting anomaly detection in the molten magnesium smelting process by exploring the correlation between video (visual features) and current (process variables). This method introduces a novel tokenization paradigm that effectively bridges the dimensional gap between 3D video modality and 1D current modality, achieving hierarchical reconstruction for pixel-level anomaly detection. Additionally, FmFormer leverages self-attention mechanisms to learn intra-modal features and captures inter-modal correlations through bidirectional cross-attention. 3. **Experimental Validation**: To validate the effectiveness of the proposed method, the authors also provide a pioneering cross-modal benchmark dataset containing over 2.2 million synchronously collected video and current samples. With the aid of cross-modal learning, the proposed FmFormer achieves state-of-the-art anomaly detection performance under extreme interference conditions (such as current fluctuations and visual occlusion caused by heavy water mist). In summary, the paper aims to improve the accuracy and robustness of anomaly detection in the molten magnesium smelting process by integrating video and current information.