Machine Anomalous Sound Detection Using Spectral-temporal Modulation Representations Derived from Machine-specific Filterbanks

Kai Li,Khalid Zaman,Xingfeng Li,Masato Akagi,Masashi Unoki
DOI: https://doi.org/10.48550/arXiv.2409.05319
2024-09-09
Abstract:Early detection of factory machinery malfunctions is crucial in industrial applications. In machine anomalous sound detection (ASD), different machines exhibit unique vibration-frequency ranges based on their physical properties. Meanwhile, the human auditory system is adept at tracking both temporal and spectral dynamics of machine sounds. Consequently, integrating the computational auditory models of the human auditory system with machine-specific properties can be an effective approach to machine ASD. We first quantified the frequency importances of four types of machines using the Fisher ratio (F-ratio). The quantified frequency importances were then used to design machine-specific non-uniform filterbanks (NUFBs), which extract the log non-uniform spectrum (LNS) feature. The designed NUFBs have a narrower bandwidth and higher filter distribution density in frequency regions with relatively high F-ratios. Finally, spectral and temporal modulation representations derived from the LNS feature were proposed. These proposed LNS feature and modulation representations are input into an autoencoder neural-network-based detector for ASD. The quantification results from the training set of the Malfunctioning Industrial Machine Investigation and Inspection dataset with a signal-to-noise (SNR) of 6 dB reveal that the distinguishing information between normal and anomalous sounds of different machines is encoded non-uniformly in the frequency domain. By highlighting these important frequency regions using NUFBs, the LNS feature can significantly enhance performance using the metric of AUC (area under the receiver operating characteristic curve) under various SNR conditions. Furthermore, modulation representations can further improve performance. Specifically, temporal modulation is effective for fans, pumps, and sliders, while spectral modulation is particularly effective for valves.
Sound,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of machine Anomaly Sound Detection (ASD) in industrial environments. Specifically, the author focuses on how to improve the performance in early - fault detection of different types of factory mechanical equipment through spectral - time - domain modulation representations. The following is a specific description of the problem: 1. **Machine - specific vibration frequency range**: Different types of machines exhibit unique vibration frequency ranges due to their physical characteristics. This means that the abnormal sounds of each machine may have significant characteristics in different frequency ranges. 2. **Characteristics of the human auditory system**: The human auditory system can well track the temporal and spectral dynamic changes of machine sounds. Therefore, combining computational auditory models with machine - specific properties can be an effective method for machine Anomaly Sound Detection. 3. **Limitations of existing methods**: Existing Auto - Encoder (AE) models are mainly used for unsupervised learning and do not include abnormal sound data during the training process, which may lead to limited discrimination ability of the model for abnormal sounds. In addition, traditional filter bank designs (such as Mel - filter banks and Gammatone - filter banks) may not be able to fully capture important high - frequency information in machine sounds, thus affecting the detection performance. To solve these problems, the author proposes the following solutions: 1. **Quantifying frequency importance**: Use the Fisher ratio (F - ratio) to quantify the frequency importance of four types of machines (fans, pumps, sliders, valves). The F - ratio formula is as follows: \[ F_m=\frac{\frac{1}{2}\sum_c(u_{m,c}-u_m)^2}{\frac{1}{2N}\sum_c\sum_{i = 1}^N(x_{i,m,c}-u_{m,c})^2} \] where \(x_{i,m,c}\) represents the sub - band energy of the \(i\) - th audio sample, and \(u_{m,c}\) and \(u_m\) represent the sub - band energy averages of class \(c\) and all classes respectively. 2. **Designing machine - specific Non - Uniform Filter Banks (NUFBs)**: Design machine - specific Non - Uniform Filter Banks according to the quantification results to extract Log - Non - Uniform Spectrum (LNS) features. These NUFBs have narrower bandwidths and higher filter distribution densities in frequency regions with high F - ratios. 3. **Spectral - time - domain modulation representation**: Extract spectral modulation (SM), time - domain modulation (TM) and spectral - time - domain modulation (STM) representations from LNS features and input them into an auto - encoder - based detector for abnormal sound detection. 4. **Experimental verification**: Use the Malfunctioning Industrial Machine Investigation and Inspection (MIMII) database for experiments to verify the effectiveness of the proposed method under different signal - to - noise ratio conditions. In summary, this research aims to develop a more effective and robust method for machine Anomaly Sound Detection by combining computational auditory models and machine - specific properties.