Improved reconstruction weight-based locally linear embedding algorithm
Fangyuan Liu,Kewen Xia,Wenjia Niu
DOI: https://doi.org/10.11834/jig.170301
2018-01-01
Journal of Image and Graphics
Abstract:Objective The development of science and technology has reduced the cost of data collection,increased data in geometric series,and caused data dimension reduction technology to be an important part of machine learning.Manifold learning method is a nonlinear dimensionality reduction technique that is widely used in the visualization,feature extraction,and solving of the computational complexity of high-dimensional data.Locally linear embedding (LLE) algorithm is a classical manifold learning algorithm in machine learning and data mining.The basic idea of the LLE algorithm is that any sampling point and its neighbors form a locally linear plane.Each nearest neighbor corresponds to a weight,and the sampling point can be linearly represented by the principle of minimizing the reconstruction error of the nearest neighbor.An improved reconstruction weight-based LLE (IRWLLE) algorithm is proposed to overcome the problem of the LLE algorithm,including noise,large curvature,and sparse sampling data.Method Geodesic distance is used to describe the structure and reconstruct weights and define the reconstruction weights in the LLE to overcome the shortcomings of the original LLE algorithm,which considers only distance factors and ignores structural factors.Structural and distance weights are added.Any sample is selected as the center point,and the nearest-neighbor point (sample) from the center point is selected as the local neighborhood according to Euclidean distance.In this neighborhood,the ratio of the geodesic distance to the Euclidean distance between the center and neighboring points is defined as the structural weight,and the ratio of the geodesic distance to the median geodesic distance between the center and neighboring points is defined as the distance weight.The product of structural and distance weights is defined as the reconstruction weight;thus,the structure and distance information of the manifold are organically combined.Geometric distance calculation method using the classic Dijkstra algorithm is commonly used in Isomap algorithm.For the distance weight,the median distance of the geodesic distance in a local neighborhood is fixed.A farther distance from the center sample point of a neighboring point indicates a smaller distance weight corresponding to this neighbor point,which is in line with the idea that "a greater distance from the neighborhood center means a smaller contribution to the reconstruction center." The geodesic distance divided by its value reduces the noise effect on the weight to a certain extent.In the structural weight part,the ratio of Euclidean distance to geodesic distance is selected to measure the linearity of the local neighborhood.A greater distance from the linear plane of a neighboring point indicates a smaller contribution to the reconstruction center point and a smaller structural weight of the adjacent point.This notion further emphasizes the importance of structure to weight and enhances noise immunity.Result Classical artificial data,such as Swiss roll,S-curve,and Helix,are experimented,and noise is added to the data.Sparse sampling is used to generate a data set.The proposed algorithm is compared with the original LLE algorithm and Hessian LLE (HLLE) algorithm.Results show that the IRWLLE algorithm is better than the LLE and HLLE algorithms by maintaining the neighbor relation of the manifold and improving the manifold.In particular,IRWLLE exhibits stronger robustness for the large curvature data set Helix.A face recognition experiment on ORL and Yale face databases is conducted by using a nearestneighbor classifier,and the recognition result of the IRWLLE algorithm is compared with that of the LLE algorithm.For the ORL dataset,the recognition rate of the IRWLLE algorithm is 90%,whereas that of the original LLE algorithm is 85.5%.For the Yale dataset,the recognition rate of the IRWLLE algorithm is 88%,whereas that of the original LLE algorithm is 75%.Therefore,the face recognition rate of IRWLLE is also greatly improved.Conclusion The proposed IRWLLE algorithm is based on the original LLE algorithm.This proposed algorithm not only introduces manifold distance information into reconstruction weight but also adds structural information,thereby effectively reducing the interference from noise and data outside the manifold.The IRWLLE algorithm has high robustness to noise data and can manage sparse sampling and large curvature data.The face recognition rate of the IRWLLE algorithm is also enhanced.