Asynchronous Perception Machine For Efficient Test-Time-Training

Rajat Modi,Yogesh Singh Rawat
2024-10-28
Abstract:In this work, we propose Asynchronous Perception Machine (APM), a computationally-efficient architecture for test-time-training (TTT). APM can process patches of an image one at a time in any order \textit{asymmetrically,} and \textit{still encode} semantic-awareness in the net. We demonstrate APM's ability to recognize out-of-distribution images \textit{without} dataset-specific pre-training, augmentation or any-pretext task. APM offers competitive performance over existing TTT approaches. To perform TTT, APM just distills test sample's representation \textit{once}. APM possesses a unique property: it can learn using just this single representation and starts predicting semantically-aware features. APM demostrates potential applications beyond test-time-training: APM can scale up to a dataset of 2D images and yield semantic-clusterings in a single forward pass. APM also provides first empirical evidence towards validating GLOM's insight, i.e. input percept is a field. Therefore, APM helps us converge towards an implementation which can do \textit{both} interpolation and perception on a \textit{shared}-connectionist hardware. Our code is publicly available at this link: <a class="link-external link-https" href="https://github.com/rajatmodi62/apm" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in **Test - Time Training (TTT)**, especially the challenges when dealing with **Out - of - Distribution (OOD)** images. Specifically, the paper proposes the **Asynchronous Perception Machine (APM)** to address the following issues: 1. **The information bottleneck problem**: - **Limitations of existing methods**: Traditional TTT methods require multiple forward propagations through multiple hidden layers, which makes the computational cost very high. - **APM's solution**: APM solves the information bottleneck problem by directly learning the final representation from the input image and reducing the number of forward propagations in the intermediate layers. 2. **Dependence on pretext tasks**: - **Limitations of existing methods**: Existing TTT methods usually rely on data augmentation or pretext tasks (such as rotation, prompt - tuning, etc.), which are difficult to optimize in an online setting. - **APM's solution**: APM only needs to calculate the representation of the test sample once during the TTT process and over - fit on this basis, without any data augmentation or pretext tasks. 3. **Memory consumption of parallel perception architectures**: - **Limitations of existing methods**: Transformer - based architectures need to project all input patches into a shared representation space, resulting in high memory consumption. - **APM's solution**: APM can process individual patches asynchronously and still be able to encode semantic awareness, thus reducing memory usage. 4. **The need for immediate decision - making**: - **Practical application scenarios**: For example, self - driving cars need to react immediately when encountering pedestrians to ensure human safety. - **APM's application**: APM can quickly adapt to new samples during testing and provide the ability to make immediate decisions, especially suitable for scenarios that require real - time responses. In summary, the main goal of APM is to improve the efficiency and performance of TTT, especially when dealing with OOD images, while reducing the consumption of computational and memory resources. Through these improvements, APM can achieve more efficient test - time training without relying on additional pre - training or data augmentation.