Abstract:Effective monitoring of manufacturing processes is crucial for maintaining product quality and operational efficiency. Modern manufacturing environments generate vast amounts of multimodal data, including visual imagery from various perspectives and resolutions, hyperspectral data, and machine health monitoring information such as actuator positions, accelerometer readings, and temperature measurements. However, interpreting this complex, high-dimensional data presents significant challenges, particularly when labeled datasets are unavailable. This paper presents a novel approach to multimodal sensor data fusion in manufacturing processes, inspired by the Contrastive Language-Image Pre-training (CLIP) model. We leverage contrastive learning techniques to correlate different data modalities without the need for labeled data, developing encoders for five distinct modalities: visual imagery, audio signals, laser position (x and y coordinates), and laser power measurements. By compressing these high-dimensional datasets into low-dimensional representational spaces, our approach facilitates downstream tasks such as process control, anomaly detection, and quality assurance. We evaluate the effectiveness of our approach through experiments, demonstrating its potential to enhance process monitoring capabilities in advanced manufacturing systems. This research contributes to smart manufacturing by providing a flexible, scalable framework for multimodal data fusion that can adapt to diverse manufacturing environments and sensor configurations.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to effectively fuse multi - modal sensor data in the absence of labeled data to achieve monitoring of advanced manufacturing processes. Specifically, the authors propose an unsupervised multi - modal data fusion method based on contrastive learning (Contrastive Learning) for processing and analyzing the complex, high - dimensional multi - modal data generated in additive manufacturing processes such as laser powder bed fusion (LPBF). ### Core of the problem 1. **Challenges in multi - modal data fusion**: - In the modern manufacturing environment, the data types generated by sensors are diverse, including visual images, audio signals, machine health monitoring information (such as actuator position, accelerometer readings, temperature measurements, etc.). These data are high - dimensional and heterogeneous and difficult to directly interpret and utilize. - The acquisition cost of labeled data is high and impractical, which limits the application of traditional supervised learning methods. 2. **Limitations of existing methods**: - Traditional sensor fusion methods usually rely on supervised learning and require a large amount of labeled data, which is difficult to achieve in a dynamic manufacturing environment. - There is a lack of effective multi - modal data fusion methods, especially when dealing with high - dimensional inputs (such as audio, images, and thermal videos), and existing research has not fully demonstrated the synergy of multi - modal data. ### Solution proposed in the paper The authors introduce a contrastive learning method inspired by the CLIP model, aiming to solve the problem in the following ways: - **Unsupervised learning framework**: Use contrastive learning techniques to correlate different data modalities without the need for labeled data. This method overcomes the limitations of traditional supervised learning methods in the manufacturing environment. - **Low - dimensional representation space**: Compress high - dimensional manufacturing data into a low - dimensional representation space for the convenience of subsequent tasks (such as process control, anomaly detection, and quality assurance). - **Multi - modal data processing**: Develop encoders capable of processing five different modal data (visual images, audio signals, laser position (x and y coordinates), laser power measurement), and train these encoders through the contrastive loss function. ### Experimental verification The paper verifies the effectiveness of the proposed method through a series of experiments, demonstrating its potential to enhance monitoring capabilities in complex manufacturing processes. This method is not only applicable to the LPBF process but also has wide applicability and can be extended to other manufacturing fields. ### Formula summary The contrastive loss function \( L \) is defined as follows: \[ L = -\frac{1}{N} \sum_{i = 1}^{N} \left[ \log \left( \frac{\exp(\text{sim}(x_i, y_i)/\tau)}{\sum_{j = 1}^{N} \exp(\text{sim}(x_i, y_j)/\tau)} \right) + \log \left( \frac{\exp(\text{sim}(y_i, x_i)/\tau)}{\sum_{j = 1}^{N} \exp(\text{sim}(y_i, x_j)/\tau)} \right) \right] \] where: - \( \text{sim}(u, v)=\frac{u^T v}{||u|| ||v||} \) is the cosine similarity between vectors \( u \) and \( v \), - \( \tau \) is the temperature parameter that controls the softmax distribution, - \( x_i \) and \( y_i \) are the \( i \)-th image and audio input respectively. In this way, the paper provides a flexible and extensible multi - modal data fusion framework that can adapt to different sensor configurations in various manufacturing environments and support more powerful data - driven decision - making.

Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring

Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing

Effective Techniques for Multimodal Data Fusion: A Comparative Analysis

Multi-modal sensor fusion with machine learning for data-driven process monitoring for additive manufacturing

Multimodal Correlation-Aware Fusion Framework for Enhanced Machinery Health Prognosis With Unlabeled and Low-Quality Data Exploitation

Deep Multimodal Data Fusion

Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation

Towards Robust Human-Robot Collaborative Manufacturing: Multimodal Fusion.

A Review of Multisensor Data Fusion Solutions in Smart Manufacturing: Systems and Trends

A Convolutional Neural Network-Based Multi-Sensor Fusion Approach for In-Situ Quality Monitoring of Selective Laser Melting

JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment

Multimodal Industrial Anomaly Detection via Hybrid Fusion

Multimodal data fusion using signal/image processing methods for multi-class machine learning

Machine Learning Multi-Modality Fusion Approaches Outperform Single-Modality & Traditional Approaches

Multi-sensor measurement and data fusion technology for manufacturing process monitoring: a literature review

Novel Multimodal Data Fusion Soft Sensor Modeling Framework Based on Meta-Learning Networks for Complex Chemical Process

Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

Jointly Optimizing Sensing Pipelines for Multimodal Mixed Reality Interaction

An integrated manifold learning approach for high-dimensional data feature extractions and its applications to online process monitoring of additive manufacturing

Multimodal Sensors and ML‐Based Data Fusion for Advanced Robots