Attention Mono-depth: Attention-Enhanced Transformer for Monocular Depth Estimation of Volatile Kiln Burden Surface

Cong Liu,Chaobo Zhang,Xiaojun Liang,Zhiming Han,Yiming Li,Chunhua Yang,Weihua Gui,Wen Gao,Xiaohao Wang,Xinghui Li
DOI: https://doi.org/10.1109/tcsvt.2024.3479412
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Accurate estimation of burden surface depth plays a crucial role in constructing the temperature field and optimizing reaction control in volatile kilns. However, most image-based depth estimation techniques require high-quality input images and achieve limited accuracy, which restrict their applications in actual harsh working conditions such as high temperature, heavy dust and dense smoke. In this study, a deep learning-based monocular depth estimation model is proposed to measure the burden surface depth in the volatile kiln head zone. The proposed model integrates an encoder-decoder network with an attention module. The encoder-decoder network outputs a set of deep semantic features, while the attention module intelligently fuses multi-level features to predict a probability distribution over depth intervals for each pixel. A volatile kiln prototype is designed and constructed to generate image datasets of the kiln head zone which approximate real data collected from industrial production sites. Results demonstrate that the proposed model has a depth prediction error of RMSE = 11.008 mm for the burden surface region, outperforming state-of-the-art neural networks and the traditional depth-from-defocus method. Code and datasets are available at https://github.com/LLLcong/Attention-MonoDepth.
What problem does this paper attempt to address?