Inverse Reflectance Model Based on Deep Learning
Wang Xi,Jian Zhenxiong,Ren Mingjun
DOI: https://doi.org/10.3788/AOS230615
2023-01-01
Acta Optica Sinica
Abstract:Objective To enhance the capability of photometric stereo to handle the isotropic non-Lambertian reflectance, an inverse reflectance model based on deep learning is proposed to achieve highly accurate surface normal estimation in this paper. Non-Lambertian reflectance is an important factor affecting the performance of optical measurements like fringe projection. To our best knowledge, photometric stereo is only one technology that could solve the effect of non-Lambertian reflectance in theory. Traditional non-Lambertian photometric stereo methods employ robust estimation, parameterized reflectance model, and general reflectance property to handle the non-Lambertian reflectance, which in essence adopts different mathematical technologies to handle the reflectance model. With the introduction of deep learning technology, it is possible to directly establish the inverse reflectance model, and the capability of photometric stereo to handle the non-Lambertian reflectance significantly increases. The represented supervised deep learning methods are CNN-PS and PSFCN. The CNN-PS directly maps the observation map recording the intensities under different lightings to the surface normal according to the orientation consistency cue. The performance of this network significantly decreases if there are a small number of lights. PS-FCN simulates the normal estimation process of the pixel-wise inverse reflectance model and employs the neighborhood information to give a robust surface normal estimation for the scene with sparse light. The pixel-wise inverse reflectance model could not globally describe the non-Lambertian reflectance, which is supplemented by introducing collocated light recently. However, there still exist theoretical limitations in the collocated light-based inverse reflectance model. Therefore, this paper attempts to complete the theoretical defect of the collocated light-based inverse reflectance model by effectively extracting the image feature related to azimuth difference and designing the deep-learning-based inverse reflectance model. Methods We first analyze the theoretical limitation of the collocated-light-based inverse reflectance model, then design the three-stage subnetworks of the proposed deep learning-based inverse reflectance model, and train the model by the new training strategies. The theoretical defect mainly comes from the assumption of Eq. (4), or in other words, the main direction a should lie on the plane extended by the l and v. Now, the BRDF input value Delta phi is simplified by the value l(T) v. However, l(T) v is not identical to the Delta phi in most circumstances, and Delta phi is highly related to the unknown surface normal. The proposed inverse reflectance model based on deep learning is designed as shown in Fig. 1, which consists of three subnetworks, i. e., the azimuth difference subnetwork, the inverse reflectance model subnetwork, and the surface normal estimation subnetwork. The first-stage subnetwork attempts to map the image o under arbitrary lighting, the collocated image o(0), and the lighting map l to the Delta phi map, and the max-pooling fused feature is introduced to represent the surface normal. The second-stage subnetwork achieves the ideal inverse reflectance model in an image feature way. The output of this subnetwork could be directly utilized to calculate the surface normal by the least-square algorithm, but the shadow thresholding value directly and dramatically influences the estimation accuracy. Thus, the third-stage subnetwork is designed to avoid error accumulation and achieve accurate surface normal estimation. To train the proposed network, the new supplement training dataset is designed to save the low-reflectance data and provide the SVBRDF scene. The three subnetworks are firstly trained separately to obtain the initial model parameters of every subnetwork and then combined to finetune the parameters. Results and Discussions In this paper, the ablation experiment is utilized to prove the effectiveness of the network design, and the synthetic experiment and real experiment are adopted to analyze the performance of the proposed method. The PS-FCN, CNN-PS, and the network proposed by Wang et al., denoted by CH20, IK18, and WJ20, are adopted as comparison methods in this paper. As shown in Table 2, the ablation experiment illustrates that the introduction of the max-pooling fusion feature benefits the extraction of the image features related to the Delta phi and the shading, and the azimuth difference subnetwork could effectively supplement the defect of the collocated light-based inverse reflectance model to better handle the isotropic reflectance. The synthetic experiment validates that the proposed method could achieve the best performance on the scene with dense lights, sparse lights, and SVBRDF. Figure 5 exhibits the superior performance of the proposed method on the sparse light scene compared with the WJ20, which shows the necessity of breaking the theoretical limitation of the collocated light-based inverse reflectance model. The real experiment based on the benchmark DiLiGenT dataset proves the state-of-the-art performance of the proposed method. Table 6 and Table 7 demonstrate that our method could achieve an average surface normal estimation accuracy of 5. 90 degrees for the real scene, and the performance of the proposed method significantly increases under the sparse light scene. Conclusions We design the inverse reflectance model based on deep learning to handle the isotropic non-Lambertian reflectance, which completes the theoretical defect of the collocated light-based inverse reflectance model by effectively extracting the image feature related to the azimuth difference. The proposed model contains three subnetworks: the azimuth difference subnetwork, the inverse reflectance model subnetwork, and the surface normal estimation subnetwork. The first two subnetworks achieve the inverse mapping between the intensity and the dot product of surface normal and lighting direction, and the third network fully employs the image features extracted by these two subnetworks to accurately estimate the surface normal. The proposed method contains three characteristics, i. e., the introduction of max-pooling fusion feature to extract the feature related to Delta phi, inverse reflectance model based on the image feature, and stage training strategy. The ablation experiment proves the rationality of the network design, and the synthetic experiments validate that the proposed method could simultaneously handle classical 100 isotropic reflectances. The real experiments based on benchmark DiLiGenT dataset illustrate that the proposed method could achieve accurate surface normal estimation with 5. 90 degrees. The synthetic and real experiments validate the state-of-the-art performance of the proposed method. In future work, we would like to inversely model the challenging anisotropic reflectance and to break the limitation of parallel lighting and orthogonal cameras for photometric stereo.