Integrating Human Vision Perception in Vision Transformers for Classifying Waste Items

Akshat Kishore Shrivastava,Tapan Kumar Gandhi
2023-12-21
Abstract:In this paper, we propose an novel methodology aimed at simulating the learning phenomenon of nystagmus through the application of differential blurring on datasets. Nystagmus is a biological phenomenon that influences human vision throughout life, notably by diminishing head shake from infancy to adulthood. Leveraging this concept, we address the issue of waste classification, a pressing global concern. The proposed framework comprises two modules, with the second module closely resembling the original Vision Transformer, a state-of-the-art model model in classification tasks. The primary motivation behind our approach is to enhance the model's precision and adaptability, mirroring the real-world conditions that the human visual system undergoes. This novel methodology surpasses the standard Vision Transformer model in waste classification tasks, exhibiting an improvement with a margin of 2%. This improvement underscores the potential of our methodology in improving model precision by drawing inspiration from human vision perception. Further research in the proposed methodology could yield greater performance results, and can be extrapolated to other global issues.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The main objective of this paper is to propose a novel method to simulate the biological phenomenon of nystagmus in the human visual system and to improve the performance of image classification tasks through this simulation, particularly for waste classification applications. Specifically, the authors focus on how to simulate the visual blur effect caused by nystagmus when training artificial intelligence models. Nystagmus is a phenomenon that affects human vision from infancy to adulthood, decreasing with age and causing the human eye to transition from blurry to clear vision. Based on this phenomenon, the researchers propose processing the dataset with varying degrees of Gaussian blur to simulate the process of the human visual system gradually becoming clearer over time. This method is applied to the waste classification task to improve classification accuracy. They employ two main modules: 1. **Nystagmus Simulation Module**: This module simulates the transition of human vision from blurry to clear by applying different levels of Gaussian blur to the images in the dataset. 2. **Vision Transformer Module**: This is a model similar to the standard Vision Transformer architecture, used to perform the classification task. This method outperformed traditional Vision Transformer models in the waste classification task, improving accuracy by approximately 2%. This indicates that processing data by mimicking human visual perception can significantly enhance model performance. In summary, this paper attempts to improve the performance of waste classification tasks by mimicking the phenomenon of nystagmus in the human visual system and demonstrates the effectiveness of this approach.