Improving Multi-label Recognition using Class Co-Occurrence Probabilities

Samyak Rawlekar,Shubhang Bhatnagar,Vishnuvardhan Pogunulu Srinivasulu,Narendra Ahuja
2024-09-20
Abstract:Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning,Multimedia,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the complexity in multi - label recognition (MLR), especially when the training data is limited. Multi - label recognition tasks involve identifying multiple objects from an image, which is more challenging than single - label classification tasks. Specifically, an image may contain a large number of class combinations, resulting in an exponentially increasing amount of data required to learn these combinations. In addition, the layout of different objects in the image may also be different, which further increases the difficulty of recognition. To address these challenges, existing methods usually rely on vision - language models (VLMs) and are trained using large - scale text - image datasets. However, these methods mainly focus on learning independent classifiers for each object and ignore the co - occurrence relationships between objects. This co - occurrence relationship can be captured by the conditional probability in the training data, but existing methods fail to make full use of this. Therefore, this paper proposes a new framework aiming to improve the performance of independent classifiers by introducing co - occurrence information between class pairs. Specifically, the authors propose a two - stage method. First, VLMs are used to obtain initial classification results, and then these results are optimized by a graph convolutional network (GCN) using the conditional probability of class pairs. This method can not only improve the accuracy of classification but also effectively alleviate the over - fitting problem on small datasets. In conclusion, the main goal of this paper is to improve the performance of multi - label recognition tasks on small datasets by introducing co - occurrence information of class pairs.