Abstract:Image segmentation in RGB space is a notoriously difficult task where state-of-the-art methods are trained on thousands or even millions of annotated images. While the performance is impressive, it is still not perfect. We propose a novel image segmentation method, achieving similar segmentation quality but without training. Instead, we require an image sequence with a static camera and a single light source at varying positions, as used in for photometric stereo, for example.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the difficult task of image segmentation in RGB space, especially how to achieve high - quality image segmentation without relying on a large amount of labeled data. Specifically, the author proposes a new image segmentation method based on shadow - hints, aiming to generate segmentation results by using image sequences taken by a static camera and a single light source at different positions.
### Problem Background
1. **Limitations of Traditional Methods**:
- Existing state - of - the - art image segmentation methods usually require thousands or even millions of labeled images for training [Kirillov et al. 2023].
- Although these methods perform excellently, there are still some deficiencies, especially when dealing with complex scenes.
2. **Advantages of the New Method**:
- The method proposed by the author does not require any labeled data, but uses the transition between light and shadow to reveal the spatial structure of the scene and track the contours of objects.
- This method is especially suitable for scenarios such as photometric stereo, where foreground objects cast shadows on background objects under different lighting conditions.
### Solution
To achieve this goal, the author proposes the following steps:
1. **Shadow Edge Detection**:
- Use the template matching method to detect the shadow - to - light transitions in all shadow masks and combine these transitions into the edge strength and direction for each pixel.
- Expressed by the formula:
\[
b_{p,d} = \frac{\sum_{l} \omega_{l,p,d} \cdot \sigma\left(\frac{E_{l,p,1} - E_{l,p,d}}{\beta}\right)}{\sum_{l} \omega_{l,p,d}}
\]
where $\omega_{l,p,d} \in \{0,1\}$ is a binary weight and $\sigma$ is the Sigmoid function.
2. **Sub - pixel Delaunay Triangulation**:
- Extract fine contours from the edge strength and direction through non - maximum suppression and double - thresholding.
- Use quadratic fitting to accurately locate the sub - pixel positions of edge pixels to obtain smoother contours.
3. **Segmentation**:
- Use Delaunay triangulation to connect edge pixels into a 2D polygonal mesh.
- Gradually fuse triangles through the minimum spanning tree algorithm to form larger fragments.
- Calculate the aspect ratio of each fragment:
\[
l_S = \frac{|S| - A_{\text{min}}}{\min_{e \in S}(|e|)}
\]
where $A_{\text{min}}$ is the minimum fragment area, which is used to ensure that small fragments are fused.
4. **Experimental Verification**:
- The author conducted comparative experiments between this method and classical and learning - based image segmentation methods, including FH04 [Felzenszwalb and Huttenlocher 2004] and Segment Anything Model (SAM23) [Kirillov et al. 2023].
- The results show that this method can produce results comparable to SAM23 in many cases, and shows higher robustness especially when dealing with textured areas.
### Conclusion
This method provides an alternative for image segmentation without relying on labeled data, and can be manually refined while controlling the segmentation granularity in real - time. This provides a new way to create labeled data sets for training learning - based image segmentation algorithms.
---
Through the above methods, the author has successfully solved the problem of achieving high - quality image segmentation without relying on a large amount of labeled data, and demonstrated its potential in practical applications.