Helping Visually Impaired People Take Better Quality Pictures

Maniratnam Mandal,Deepti Ghadiyaram,Danna Gurari,Alan C. Bovik
DOI: https://doi.org/10.1109/TIP.2023.3282067
2023-05-14
Abstract:Perception-based image analysis technologies can be used to help visually impaired people take better quality pictures by providing automated guidance, thereby empowering them to interact more confidently on social media. The photographs taken by visually impaired users often suffer from one or both of two kinds of quality issues: technical quality (distortions), and semantic quality, such as framing and aesthetic composition. Here we develop tools to help them minimize occurrences of common technical distortions, such as blur, poor exposure, and noise. We do not address the complementary problems of semantic quality, leaving that aspect for future work. The problem of assessing and providing actionable feedback on the technical quality of pictures captured by visually impaired users is hard enough, owing to the severe, commingled distortions that often occur. To advance progress on the problem of analyzing and measuring the technical quality of visually impaired user-generated content (VI-UGC), we built a very large and unique subjective image quality and distortion dataset. This new perceptual resource, which we call the LIVE-Meta VI-UGC Database, contains $40$K real-world distorted VI-UGC images and $40$K patches, on which we recorded $2.7$M human perceptual quality judgments and $2.7$M distortion labels. Using this psychometric resource we also created an automatic blind picture quality and distortion predictor that learns local-to-global spatial quality relationships, achieving state-of-the-art prediction performance on VI-UGC pictures, significantly outperforming existing picture quality models on this unique class of distorted picture data. We also created a prototype feedback system that helps to guide users to mitigate quality issues and take better quality pictures, by creating a multi-task learning framework.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to help visually - impaired people take higher - quality photos. Specifically, the paper focuses on the technical quality problems that often occur in photos taken by visually - impaired people, such as blurring, poor exposure, and noise. These problems make it difficult for these photos to be interacted with on social media. Therefore, researchers have developed tools and techniques to help visually - impaired people reduce the occurrence of these technical quality problems, thereby increasing their self - confidence and social participation. The main contributions of the paper include: 1. **Constructed the largest subjective image quality and distortion database**: This new resource, called the LIVE - Meta VI - UGC database, contains approximately 40,000 pictures taken by visually - impaired users and 40,000 patches cropped from these pictures. The researchers conducted a large - scale subjective picture quality study and collected 2.7 million labels for perceived quality and distortion types. This is the largest publicly available distortion classification dataset to date. 2. **Created a state - of - the - art blind (or no - reference) VI - UGC picture quality and distortion predictor**: Using a deep neural architecture based on the recently successful PaQ - 2 - PiQ model, the researchers developed a multi - task learning system that can predict the perceived quality of pictures taken by visually - impaired users and the presence of five common picture distortions. The model can spatially predict maps of quality and distortion types and performs well on the new dataset and the independent ORBIT image dataset. 3. **Developed a prototype feedback system**: Using the multi - task model, the researchers created a prototype feedback system to help visually - impaired users take higher - quality photos. The system provides feedback on the overall (global) picture quality and makes suggestions on how to alleviate quality problems. This feedback includes how to use the obtained spatial distortion maps to generate detailed, localized feedback, and these ideas have been implemented in actual smartphone (iOS and Android) implementations. Through these contributions, the paper aims to provide a computer vision solution to help visually - impaired users identify and reduce technical quality problems in pictures by providing feedback (such as auditory, tactile feedback) during the shooting process.