Abstract:Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image segmentation and object tracking in videos, is a deep learning component which performs unsupervised object-centric scene decomposition on input images. It is based on an attention architecture, in which latent slot vectors, which hold compressed information on objects, attend to localized perceptual features from the input image. In this paper, we show that design decisions on normalizing the aggregated values in the attention architecture have considerable impact on the capabilities of Slot Attention to generalize to a higher number of slots and objects as seen during training. We argue that the original Slot Attention normalization scheme discards information on the prior assignment probability of pixels to slots, which impairs its generalization capabilities. Based on these findings, we propose and investigate alternative normalization approaches which increase the generalization capabilities of Slot Attention to varying slot and object counts, resulting in performance gains on the task of unsupervised image segmentation.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper mainly explores the impact of the design choices of the attention value normalization method in the **Slot Attention** module on the generalization ability of the model. Specifically, the researchers are concerned with how these design decisions affect the performance of Slot Attention when dealing with a larger number of objects or slots than during training. #### Main problems: 1. **Limitations of existing normalization schemes**: The paper points out that the normalization scheme in the original Slot Attention discards prior probability information about pixel - to - slot assignments, which impairs its generalization ability. 2. **The need to improve generalization ability**: In order to enable Slot Attention to better handle different numbers of objects and slots, new normalization methods need to be explored. #### Solutions: The authors propose several alternative normalization methods and verify their effects through theoretical analysis and experiments. Specifically, these include: - **Weighted Sum Normalization**: Compared with weighted mean normalization, this method retains more information about the input distribution ratio, which helps to improve the generalization ability of the model. - **Batch Normalization**: By introducing batch normalization to dynamically adjust the normalization factor, it can better adapt to different numbers of objects and slots. #### Experimental results: Through experiments on the CLEVR and MOVi - C datasets, the authors demonstrate the superior performance of the newly proposed normalization methods when dealing with different numbers of objects. In particular, in the case of using 11 slots during evaluation, the new normalization methods significantly outperform the baseline methods. ### Summary: This paper aims to improve the generalization ability of Slot Attention when dealing with different numbers of objects by improving the normalization method in the Slot Attention module, thereby providing better performance for tasks such as unsupervised image segmentation.

Attention Normalization Impacts Cardinality Generalization in Slot Attention

Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames

Improving Object-centric Learning with Query Optimization

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Sensitivity of Slot-Based Object-Centric Models to their Number of Slots

An Investigation of the Impact of Normalization Schemes on GCN Modelling

Enhancing Interpretable Object Abstraction via Clustering-based Slot Initialization

ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning

Exploring the Role of the Bottleneck in Slot-Based Models Through Covariance Regularization

NAM: Normalization-based Attention Module

Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention

Learning Global Object-Centric Representations via Disentangled Slot Attention

Guided Slot Attention for Unsupervised Video Object Segmentation

Efficient Multi-Scale Attention Module with Cross-Spatial Learning

Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior

NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention

Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

Efficient Semantic Image Synthesis via Class-Adaptive Normalization

Slot-VAE: Object-Centric Scene Generation with Slot Attention

Normal Learning in Videos with Attention Prototype Network

Object-Centric Learning with Slot Mixture Module