Image-guided topic modeling for interpretable privacy classification

Alina Elena Baia,Andrea Cavallaro
2024-09-27
Abstract:Predicting and explaining the private information contained in an image in human-understandable terms is a complex and contextual task. This task is challenging even for large language models. To facilitate the understanding of privacy decisions, we propose to predict image privacy based on a set of natural language content descriptors. These content descriptors are associated with privacy scores that reflect how people perceive image content. We generate descriptors with our novel Image-guided Topic Modeling (ITM) approach. ITM leverages, via multimodality alignment, both vision information and image textual descriptions from a vision language model. We use the ITM-generated descriptors to learn a privacy predictor, Priv$\times$ITM, whose decisions are interpretable by design. Our Priv$\times$ITM classifier outperforms the reference interpretable method by 5 percentage points in accuracy and performs comparably to the current non-interpretable state-of-the-art model.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the interpretability and accuracy of image privacy classification. Specifically, the author aims to predict and explain the private information contained in an image through natural - language descriptions, enabling users to understand why a certain image is judged as private or public. Although traditional methods can perform privacy classification to a certain extent, they often lack specific explanations for classification decisions, making it difficult for users to assess the risks of image sharing. ### Main Problems and Solutions 1. **Subjectivity and Complexity of Privacy Classification**: - Image privacy classification is a complex task because it depends on the context and everyone has different concepts of privacy. Traditional deep - learning models face challenges when dealing with this subjectivity and complexity. - The author proposes a new method - Image - guided Topic Modeling (ITM), which combines visual information and text descriptions to generate content descriptors. These descriptors are not only used to predict image privacy but also to explain classification decisions. 2. **Limitations of Existing Methods**: - Existing methods usually rely on manually - annotated features or predefined privacy modules, which limit the generality and flexibility of the methods. - Some methods use posterior - explanation techniques (such as SHAP) to explain model decisions, but these methods cannot directly provide natural - language explanations. 3. **Improving the Interpretability of Privacy Classification**: - The ITM method utilizes visual information and image - text descriptions through multimodality alignment to generate natural - language content descriptors. - These descriptors are associated with privacy scores, reflecting how people perceive image content. In this way, the model can not only make accurate classification decisions but also explain these decisions in natural language. 4. **Performance Improvement**: - The Priv×ITM classifier proposed by the author is superior to existing interpretable methods in terms of accuracy and performs comparably to the current state - of - the - art non - interpretable models. - Specifically, Priv×ITM is 5 percentage points higher in accuracy than the referenced interpretable methods. ### Summary By introducing the Image - guided Topic Modeling (ITM) method, this paper solves two key problems in image privacy classification: one is to improve the interpretability of classification, enabling users to understand the reasons for classification decisions; the other is to improve the accuracy of classification, making the model perform better in privacy classification tasks. This method is not only applicable to image privacy classification but can also be extended to other complex tasks that require interpretability.