Abstract:In this work, we propose Asynchronous Perception Machine (APM), a computationally-efficient architecture for test-time-training (TTT). APM can process patches of an image one at a time in any order \textit{asymmetrically,} and \textit{still encode} semantic-awareness in the net. We demonstrate APM's ability to recognize out-of-distribution images \textit{without} dataset-specific pre-training, augmentation or any-pretext task. APM offers competitive performance over existing TTT approaches. To perform TTT, APM just distills test sample's representation \textit{once}. APM possesses a unique property: it can learn using just this single representation and starts predicting semantically-aware features. APM demostrates potential applications beyond test-time-training: APM can scale up to a dataset of 2D images and yield semantic-clusterings in a single forward pass. APM also provides first empirical evidence towards validating GLOM's insight, i.e. input percept is a field. Therefore, APM helps us converge towards an implementation which can do \textit{both} interpolation and perception on a \textit{shared}-connectionist hardware. Our code is publicly available at this link: <a class="link-external link-https" href="https://github.com/rajatmodi62/apm" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in **Test - Time Training (TTT)**, especially the challenges when dealing with **Out - of - Distribution (OOD)** images. Specifically, the paper proposes the **Asynchronous Perception Machine (APM)** to address the following issues: 1. **The information bottleneck problem**: - **Limitations of existing methods**: Traditional TTT methods require multiple forward propagations through multiple hidden layers, which makes the computational cost very high. - **APM's solution**: APM solves the information bottleneck problem by directly learning the final representation from the input image and reducing the number of forward propagations in the intermediate layers. 2. **Dependence on pretext tasks**: - **Limitations of existing methods**: Existing TTT methods usually rely on data augmentation or pretext tasks (such as rotation, prompt - tuning, etc.), which are difficult to optimize in an online setting. - **APM's solution**: APM only needs to calculate the representation of the test sample once during the TTT process and over - fit on this basis, without any data augmentation or pretext tasks. 3. **Memory consumption of parallel perception architectures**: - **Limitations of existing methods**: Transformer - based architectures need to project all input patches into a shared representation space, resulting in high memory consumption. - **APM's solution**: APM can process individual patches asynchronously and still be able to encode semantic awareness, thus reducing memory usage. 4. **The need for immediate decision - making**: - **Practical application scenarios**: For example, self - driving cars need to react immediately when encountering pedestrians to ensure human safety. - **APM's application**: APM can quickly adapt to new samples during testing and provide the ability to make immediate decisions, especially suitable for scenarios that require real - time responses. In summary, the main goal of APM is to improve the efficiency and performance of TTT, especially when dealing with OOD images, while reducing the consumption of computational and memory resources. Through these improvements, APM can achieve more efficient test - time training without relying on additional pre - training or data augmentation.

Asynchronous Perception Machine For Efficient Test-Time-Training

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

Retinomorphic Sensing

Retinomorphic Object Detection in Asynchronous Visual Streams.

Towards Streaming Perception

CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception

PASS: Patch Automatic Skip Scheme for Efficient On-device Video Perception

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

Frustratingly Easy Test-Time Adaptation of Vision-Language Models

Text-image Alignment for Diffusion-based Perception

Asynchronous Feedback Network for Perceptual Point Cloud Quality Assessment

Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation

BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives

Context-Aware Streaming Perception in Dynamic Environments

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

AIPT: Adaptive information perception for online multi-object tracking

Dirty Pixels: Towards End-to-End Image Processing and Perception

A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

DART: Dual-Modal Adaptive Online Prompting and Knowledge Retention for Test-Time Adaptation

Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation

Aligning and Prompting Everything All at Once for Universal Visual Perception