Decentralizing Test-time Adaptation under Heterogeneous Data Streams

Zixian Su,Jingwei Guo,Xi Yang,Qiufeng Wang,Kaizhu Huang
2024-11-16
Abstract:While Test-Time Adaptation (TTA) has shown promise in addressing distribution shifts between training and testing data, its effectiveness diminishes with heterogeneous data streams due to uniform target estimation. As previous attempts merely stabilize model fine-tuning over time to handle continually changing environments, they fundamentally assume a homogeneous target domain at any moment, leaving the intrinsic real-world data heterogeneity unresolved. This paper delves into TTA under heterogeneous data streams, moving beyond current model-centric limitations. By revisiting TTA from a data-centric perspective, we discover that decomposing samples into Fourier space facilitates an accurate data separation across different frequency levels. Drawing from this insight, we propose a novel Frequency-based Decentralized Adaptation (FreDA) framework, which transitions data from globally heterogeneous to locally homogeneous in Fourier space and employs decentralized adaptation to manage diverse distribution <a class="link-external link-http" href="http://shifts.Interestingly" rel="external noopener nofollow">this http URL</a>, we devise a novel Fourier-based augmentation strategy to assist in decentralizing adaptation, which individually enhances sample quality for capturing each type of distribution shifts. Extensive experiments across various settings (corrupted, natural, and medical environments) demonstrate the superiority of our proposed framework over the state-of-the-arts.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that under heterogeneous data streams, existing Test - Time Adaptation (TTA) methods are not effective in dealing with distribution shifts. Specifically: 1. **Limitations of existing TTA methods**: - Current TTA methods usually assume that the target domain is homogeneous at all times, that is, all test samples have a similar type of distribution shift. However, in the real world, data is often heterogeneous, and different types of distribution shifts may co - exist and conflict with each other. - These methods mainly focus on improving at the model level, for example, by stabilizing the fine - tuning process to deal with the changing environment, but they fail to fully address the heterogeneity of the data itself. 2. **Specific manifestations of the problem**: - When the model tries to adapt to multiple different and even conflicting distribution shifts simultaneously, it may encounter adaptation conflicts. For example, adjusting parameters to adapt to changes in image brightness may conflict with the parameter updates required to adapt to texture changes. - Such conflicts will cause the model to be unable to effectively generalize to all encountered distribution shifts, resulting in an irreversible decline in prediction performance. 3. **Solutions proposed in the paper**: - The authors propose a Frequency - based Decentralized Adaptation (FreDA) framework. Starting from the data level, this framework uses the Fourier transform to decompose data into the frequency space, thereby achieving more accurate data separation. - By converting global heterogeneous data into local homogeneous data and adopting a decentralized adaptation strategy, FreDA can better manage diverse distribution shifts. - In addition, a Fourier - transform - based enhancement strategy is introduced. By increasing the number and quality of samples for each distribution shift type, the robustness and prediction ability of the model are further improved. In summary, this paper aims to address the shortcomings of existing methods in dealing with distribution shifts under heterogeneous data streams by re - examining TTA and introducing a new framework, thereby improving the adaptability and performance of the model in complex real - world environments. ### Related formulas - **Fourier transform**: \[ F(x)(u, v)=\sum_{h = 0}^{H - 1}\sum_{w = 0}^{W - 1}x(h, w)e^{-j2\pi\left(\frac{hu}{H}+\frac{wv}{W}\right)} \] where \(H\) and \(W\) are the height and width of the image respectively, and \(u\) and \(v\) are frequency coordinates. - **Magnitude and phase calculation**: \[ A(x)(u, v)=\sqrt{R^{2}(x)(u, v)+I^{2}(x)(u, v)} \] \[ P(x)(u, v)=\arctan\left(\frac{I(x)(u, v)}{R(x)(u, v)}\right) \] - **High - frequency component filtering**: \[ G(x)(u, v)=A(x)(u, v)\cdot M(u, v) \] where \(M(u, v)\) is a mask matrix used to filter low - frequency components and highlight high - frequency components. - **Clustering optimization objective**: \[ \min_{C, Z}\sum_{i = 1}^{n}\|A_{hf, i}-C Z_{i}\|^{2}_{2} \] - **Parameter aggregation**: \[ \theta_{\text{global}}=\frac{\sum_{k = 1}^{K}|D_{k}|\theta_{k}}{\sum_{j = 1}^{K}|D_{j}|} \] - **Total loss function**: \[ L_{\text{total}}=\frac{1}{n}\sum_{i = 1}^{n}H(y_