Efficient Iterative Dynamic Kernel Principal Component Analysis Monitoring Method for the Batch Process with Super-large-scale Data Sets

Yajun Wang,Hongli Yu,Xiaohui Li
DOI: https://doi.org/10.1021/acsomega.0c06039
IF: 4.1
2021-04-06
ACS Omega
Abstract:The Internet environment has provided massive data to the actual industrial production process. It not only has large amounts of data but also has a high data dimension, which brings challenges to the traditional statistical process monitoring. Aiming at the nonlinearity and dynamics of industrial large-scale high-dimensional data, an efficient iterative multiple dynamic kernel principal component analysis (IMDKPCA) method is proposed to monitor the complex industrial process with super-large-scale high-dimensional data. In KPCA, a new KK<sup>T</sup> matrix is first created by using kernel matrix K. According to the properties of the symmetric matrix, the newly constructed matrix has the same eigenvector as the original matrix K; hence, each column of the matrix K can be used as the input sample of the iteration algorithm. After iterative operation, the kernel principal component can be deduced fleetly without the eigen decomposition. Because the kernel matrix is not stored in the algorithm beforehand, it can effectively reduce the computation complexity of the kernel. Especially for a tremendous data scale, the traditional eigen decomposition technology is no longer appropriate, yet the presented method can be solved quickly. The autoregressive moving average (ARMA) time series model and kernel principal component analysis (KPCA) are combined to build the IDKPCA model for dealing with the dynamics and nonlinearity in the industrial process. Eventually, it is applied to monitor faults in the penicillin fermentation process and compared with MKPCA to certify the accuracy and applicability of the proposed method.
chemistry, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in view of the complexity of large - scale high - dimensional data sets in industry, propose an effective method to monitor the nonlinear and dynamic characteristics in the batch process with ultra - large - scale data sets. Specifically, traditional statistical process monitoring methods face challenges when dealing with large amounts of data and high - dimensional data, especially in dealing with nonlinear and dynamic characteristics. Therefore, this paper proposes an efficient iterative multiple dynamic kernel principal component analysis (IMDKPCA) method to meet these challenges. ### Specific description of the problem 1. **Large data and high - dimension**: The data generated in modern industrial production processes are not only large in quantity but also high in dimension, which brings great challenges to traditional statistical process monitoring methods. 2. **Nonlinear and dynamic characteristics**: The data in industrial processes usually have nonlinear and dynamic characteristics, which make it difficult for traditional linear methods to deal with effectively. 3. **Computational complexity**: Traditional methods such as kernel principal component analysis (KPCA) need to perform eigenvalue decomposition and matrix inversion operations. When the data scale is very large, these operations will lead to a huge computational burden and may even become infeasible. ### Proposed solutions In order to overcome the above problems, this paper proposes the IMDKPCA method, and its main features include: - **Avoid eigenvalue decomposition**: By constructing a new \(K^T K\) matrix and using the properties of symmetric matrices, directly extract input samples from the columns of the kernel matrix for iterative operations, thereby avoiding eigenvalue decomposition and greatly reducing the computational complexity. - **Combine ARMA model**: Combine the autoregressive moving average (ARMA) time - series model with KPCA to construct the IDKPCA model to deal with the dynamic and nonlinear problems in industrial processes. - **Applicable to ultra - large - scale data sets**: For ultra - large - scale data sets, this method can be quickly solved without pre - storing the kernel matrix, thereby effectively reducing the computational complexity. ### Application examples This method is applied to the fault monitoring of the penicillin fermentation process, and a comparative experiment with multi - way kernel principal component analysis (MKPCA) is carried out to verify its accuracy and applicability. ### Conclusion Experiments on the penicillin fermentation process prove that the IMDKPCA method shows higher efficiency and accuracy in dealing with the nonlinear and dynamic characteristics of large - scale high - dimensional data sets, especially having obvious advantages in real - time monitoring and fault detection.