Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Ke Fan,Zechen Bai,Tianjun Xiao,Tong He,Max Horn,Yanwei Fu,Francesco Locatello,Zheng Zhang
2024-06-13
Abstract:Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots. This not only necessitates prior knowledge of the dataset but also overlooks the inherent variability in the number of objects present in each instance. To overcome this fundamental limitation, we present a novel complexity-aware object auto-encoder framework. Within this framework, we introduce an adaptive slot attention (AdaSlot) mechanism that dynamically determines the optimal number of slots based on the content of the data. This is achieved by proposing a discrete slot sampling module that is responsible for selecting an appropriate number of slots from a candidate list. Furthermore, we introduce a masked slot decoder that suppresses unselected slots during the decoding process. Our framework, tested extensively on object discovery tasks with various datasets, shows performance matching or exceeding top fixed-slot models. Moreover, our analysis substantiates that our method exhibits the capability to dynamically adapt the slot number according to each instance's complexity, offering the potential for further exploration in slot attention research. Project will be available at <a class="link-external link-https" href="https://kfan21.github.io/AdaSlot/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is that the existing object - centric learning (OCL) methods based on slot attention rely on a predefined number of slots. This predefined method not only requires prior knowledge of the dataset, but also ignores the inherent variability in the number of objects in different instances. Specifically: 1. **Limitations of the predefined number of slots**: Most existing object - centric models (including slot attention) rely on a preset number of slots. This not only requires prior understanding of the dataset, but also in practical applications, the number of objects in different images may vary, and a fixed number of slots cannot adapt to this change. 2. **Object segmentation problem**: If the number of slots is not set properly, it will lead to poor segmentation results. For example, an insufficient number of slots will lead to under - segmentation, while too many slots may lead to over - segmentation. As shown in Figure 1, the choice of the number of slots has a significant impact on the segmentation quality. To solve these problems, the author proposes a new complexity - aware object auto - encoder framework and introduces an Adaptive Slot Attention (AdaSlot) mechanism. This mechanism can dynamically determine the optimal number of slots according to the data content, thus overcoming the limitations of a fixed number of slots. Specifically: - **Dynamic slot selection**: By introducing a discrete slot sampling module, AdaSlot can select an appropriate number of slots from the candidate list. - **Masked - slot decoder**: During the decoding process, the masked - slot decoder will suppress the unselected slots to ensure that only the most relevant slots are used for reconstruction. This method not only improves the flexibility and adaptability of the model, but also shows performance equivalent to or better than that of the fixed - slot model in experiments on multiple datasets. In addition, AdaSlot can dynamically adjust the number of slots according to the complexity of each instance, further enhancing the effect of the object discovery task. ### Summary The main contributions of the paper include: 1. Proposing a new complexity - aware object auto - encoder framework, which solves the limitations of a fixed number of slots. 2. Introducing an efficient and differentiable slot selection module, which can identify and retain the most informative slots before reconstruction. 3. Designing a masked - slot decoder, which effectively removes the information of unused slots. 4. Proving the effectiveness of this method through extensive experiments, especially performing well in instance - level slot number selection. These improvements make AdaSlot more flexible and efficient in handling object discovery tasks and are suitable for a variety of application scenarios.