Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring

Matthew McKinney,Anthony Garland,Dale Cillessen,Jesse Adamczyk,Dan Bolintineanu,Michael Heiden,Elliott Fowler,Brad L. Boyce
2024-10-30
Abstract:Effective monitoring of manufacturing processes is crucial for maintaining product quality and operational efficiency. Modern manufacturing environments generate vast amounts of multimodal data, including visual imagery from various perspectives and resolutions, hyperspectral data, and machine health monitoring information such as actuator positions, accelerometer readings, and temperature measurements. However, interpreting this complex, high-dimensional data presents significant challenges, particularly when labeled datasets are unavailable. This paper presents a novel approach to multimodal sensor data fusion in manufacturing processes, inspired by the Contrastive Language-Image Pre-training (CLIP) model. We leverage contrastive learning techniques to correlate different data modalities without the need for labeled data, developing encoders for five distinct modalities: visual imagery, audio signals, laser position (x and y coordinates), and laser power measurements. By compressing these high-dimensional datasets into low-dimensional representational spaces, our approach facilitates downstream tasks such as process control, anomaly detection, and quality assurance. We evaluate the effectiveness of our approach through experiments, demonstrating its potential to enhance process monitoring capabilities in advanced manufacturing systems. This research contributes to smart manufacturing by providing a flexible, scalable framework for multimodal data fusion that can adapt to diverse manufacturing environments and sensor configurations.
Machine Learning,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to effectively fuse multi - modal sensor data in the absence of labeled data to achieve monitoring of advanced manufacturing processes. Specifically, the authors propose an unsupervised multi - modal data fusion method based on contrastive learning (Contrastive Learning) for processing and analyzing the complex, high - dimensional multi - modal data generated in additive manufacturing processes such as laser powder bed fusion (LPBF). ### Core of the problem 1. **Challenges in multi - modal data fusion**: - In the modern manufacturing environment, the data types generated by sensors are diverse, including visual images, audio signals, machine health monitoring information (such as actuator position, accelerometer readings, temperature measurements, etc.). These data are high - dimensional and heterogeneous and difficult to directly interpret and utilize. - The acquisition cost of labeled data is high and impractical, which limits the application of traditional supervised learning methods. 2. **Limitations of existing methods**: - Traditional sensor fusion methods usually rely on supervised learning and require a large amount of labeled data, which is difficult to achieve in a dynamic manufacturing environment. - There is a lack of effective multi - modal data fusion methods, especially when dealing with high - dimensional inputs (such as audio, images, and thermal videos), and existing research has not fully demonstrated the synergy of multi - modal data. ### Solution proposed in the paper The authors introduce a contrastive learning method inspired by the CLIP model, aiming to solve the problem in the following ways: - **Unsupervised learning framework**: Use contrastive learning techniques to correlate different data modalities without the need for labeled data. This method overcomes the limitations of traditional supervised learning methods in the manufacturing environment. - **Low - dimensional representation space**: Compress high - dimensional manufacturing data into a low - dimensional representation space for the convenience of subsequent tasks (such as process control, anomaly detection, and quality assurance). - **Multi - modal data processing**: Develop encoders capable of processing five different modal data (visual images, audio signals, laser position (x and y coordinates), laser power measurement), and train these encoders through the contrastive loss function. ### Experimental verification The paper verifies the effectiveness of the proposed method through a series of experiments, demonstrating its potential to enhance monitoring capabilities in complex manufacturing processes. This method is not only applicable to the LPBF process but also has wide applicability and can be extended to other manufacturing fields. ### Formula summary The contrastive loss function \( L \) is defined as follows: \[ L = -\frac{1}{N} \sum_{i = 1}^{N} \left[ \log \left( \frac{\exp(\text{sim}(x_i, y_i)/\tau)}{\sum_{j = 1}^{N} \exp(\text{sim}(x_i, y_j)/\tau)} \right) + \log \left( \frac{\exp(\text{sim}(y_i, x_i)/\tau)}{\sum_{j = 1}^{N} \exp(\text{sim}(y_i, x_j)/\tau)} \right) \right] \] where: - \( \text{sim}(u, v)=\frac{u^T v}{||u|| ||v||} \) is the cosine similarity between vectors \( u \) and \( v \), - \( \tau \) is the temperature parameter that controls the softmax distribution, - \( x_i \) and \( y_i \) are the \( i \)-th image and audio input respectively. In this way, the paper provides a flexible and extensible multi - modal data fusion framework that can adapt to different sensor configurations in various manufacturing environments and support more powerful data - driven decision - making.