Improving generalization for geometric variations in images for efficient deep learning
Shivam Grover,Kshitij Sidana,Vanita Jain,Rachna Jain,Anand Nayyar
DOI: https://doi.org/10.1007/s11042-023-17897-z
IF: 2.577
2024-01-12
Multimedia Tools and Applications
Abstract:Deep Learning models for tasks such as image classification have a hard time adapting to the unseen geometric variations (such as scale, perspective, pose, etc) that the real-world offers in its data points, which can cause even a state-of-the-art model to perform poorly when used in real-world applications. The research paper aims to solve this issue by presenting a two-step method to improve the generalization capabilities over geometric variations, by leveraging GANs for data augmentation to bring realistic geometric diversity into the dataset and adding a deformable convolutional layer within CNNs (which includes adding a trainable offset parameter for geometric variations) which makes the model more dynamic while learning. The model's efficacy is demonstrated through experiments in two domains: image classification and view translation. Metrics, including accuracy, Root Mean Square Error (RMSE), and Structural Similarity Index (SSIM), are used to evaluate the model's performance. The proposed method improves the performance by 36% as compared to existing state-of-the-art model for conditional image translation, and the baseline accuracy is improved by 7% in Convolutional Neural Networks (CNN) classifier for the task of classification. The proposed approach outperforms existing methods by demonstrating superior accuracy and lower RMSE on deformed datasets.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering