DT-LSD: Deformable Transformer-based Line Segment Detection

Sebastian Janampa,Marios Pattichis
2024-11-20
Abstract:Line segment detection is a fundamental low-level task in computer vision, and improvements in this task can impact more advanced methods that depend on it. Most new methods developed for line segment detection are based on Convolutional Neural Networks (CNNs). Our paper seeks to address challenges that prevent the wider adoption of transformer-based methods for line segment detection. More specifically, we introduce a new model called Deformable Transformer-based Line Segment Detection (DT-LSD) that supports cross-scale interactions and can be trained quickly. This work proposes a novel Deformable Transformer-based Line Segment Detector (DT-LSD) that addresses LETR's drawbacks. For faster training, we introduce Line Contrastive DeNoising (LCDN), a technique that stabilizes the one-to-one matching process and speeds up training by 34$\times$. We show that DT-LSD is faster and more accurate than its predecessor transformer-based model (LETR) and outperforms all CNN-based models in terms of accuracy. In the Wireframe dataset, DT-LSD achieves 71.7 for $sAP^{10}$ and 73.9 for $sAP^{15}$; while 33.2 for $sAP^{10}$ and 35.1 for $sAP^{15}$ in the YorkUrban dataset.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is some challenges existing in the existing Transformer - based methods in the line segment detection task, which hinder their wider application. Specifically, the paper points out: 1. **Limitations of Existing Methods**: - Most new line segment detection methods are based on Convolutional Neural Networks (CNN). Although CNN requires additional post - processing steps to generate the final prediction when dealing with line segment detection tasks. - Transformer - based methods (such as LETR) can capture long - range dependencies between pixels, but they have deficiencies in training speed and performance. In particular, LETR only supports the enhancement of single - scale feature maps and lacks cross - scale interaction ability, resulting in slow convergence speed and high computational complexity. 2. **Proposed New Method**: - The paper introduces a new model, called Deformable - Transformer - based Line Segment Detection (DT - LSD), which supports cross - scale interaction and can be trained quickly. - To accelerate the training process, the paper proposes the "Line Contrastive DeNoising" (LCDN) technique, which improves the training speed by 34 times by stabilizing the one - to - one matching process. 3. **Main Contributions**: - A new end - to - end Transformer framework is proposed, which outperforms CNN - based line segment detectors in accuracy. This is achieved by using the deformable attention mechanism. - An efficient training technique, Line Contrastive DeNoising (LCDN), is introduced, which reduces the required number of training epochs, enabling DT - LSD to reach convergence within a similar number of epochs as CNN - based models. - Experimental results on two datasets (Wireframe and YorkUrban) show that DT - LSD outperforms the existing state - of - the - art methods in both structural and heatmap metrics. - This work provides an opportunity for line segment detectors to remove hand - designed post - processing by leveraging end - to - end Transformer models. In conclusion, this paper aims to improve the performance and training efficiency of line segment detection tasks by improving Transformer - based methods, thereby promoting further development in this field.