When Semantic Segmentation Meets Frequency Aliasing

Linwei Chen,Lin Gu,Ying Fu
2024-03-25
Abstract:Despite recent advancements in semantic segmentation, where and what pixels are hard to segment remains largely unexplored. Existing research only separates an image into easy and hard regions and empirically observes the latter are associated with object boundaries. In this paper, we conduct a comprehensive analysis of hard pixel errors, categorizing them into three types: false responses, merging mistakes, and displacements. Our findings reveal a quantitative association between hard pixels and aliasing, which is distortion caused by the overlapping of frequency components in the Fourier domain during downsampling. To identify the frequencies responsible for aliasing, we propose using the equivalent sampling rate to calculate the Nyquist frequency, which marks the threshold for aliasing. Then, we introduce the aliasing score as a metric to quantify the extent of aliasing. While positively correlated with the proposed aliasing score, three types of hard pixels exhibit different patterns. Here, we propose two novel de-aliasing filter (DAF) and frequency mixing (FreqMix) modules to alleviate aliasing degradation by accurately removing or adjusting frequencies higher than the Nyquist frequency. The DAF precisely removes the frequencies responsible for aliasing before downsampling, while the FreqMix dynamically selects high-frequency components within the encoder block. Experimental results demonstrate consistent improvements in semantic segmentation and low-light instance segmentation tasks. The code is available at:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the issue of "hard pixels" in semantic segmentation, particularly those pixels that are difficult to segment correctly at object boundaries. Specifically: 1. **Hard Pixel Classification**: The paper categorizes hard pixel errors into three types: - **False Responses**: Misjudging boundaries in non-target areas. - **Merging Mistakes**: Failing to correctly predict boundaries within the target area, leading to two objects being incorrectly merged or prediction missing. - **Displacements**: The predicted position deviates from the true boundary position. 2. **Quantitative Analysis**: The study finds that these hard pixel errors are closely related to the phenomenon of aliasing. Aliasing refers to the overlap of high-frequency components into low-frequency components during downsampling, causing information distortion. 3. **Solutions**: To mitigate the negative effects of aliasing, the paper proposes two methods: - **De-Aliasing Filter (DAF)**: Precisely removing high-frequency components that cause aliasing before downsampling. - **Frequency Mixing Module (FreqMix)**: Dynamically selecting and adjusting high and low-frequency components within the encoding block to balance feature information. Experimental validation shows that these two methods significantly improve performance in standard semantic segmentation tasks as well as instance segmentation tasks under low-light conditions.