Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers

Thanh Thi Nguyen,Campbell Wilson,Janis Dalins
2024-08-20
Abstract:Assessing the forensic value of hand images involves the use of unique features and patterns present in an individual's hand. The human hand has distinct characteristics, such as the pattern of veins, fingerprints, and the geometry of the hand itself. This paper investigates the use of vision transformers (ViTs) for classification of hand images. We use explainability tools to explore the internal representations of ViTs and assess their impact on the model outputs. Utilizing the internal understanding of ViTs, we introduce distillation methods that allow a student model to adaptively extract knowledge from a teacher model while learning on data of a different domain to prevent catastrophic forgetting. Two publicly available hand image datasets are used to conduct a series of experiments to evaluate performance of the ViTs and our proposed adaptive distillation methods. The experimental results demonstrate that ViT models significantly outperform traditional machine learning methods and the internal states of ViTs are useful for explaining the model outputs in the classification task. By averting catastrophic forgetting, our distillation methods achieve excellent performance on data from both source and target domains, particularly when these two domains exhibit significant dissimilarity. The proposed approaches therefore can be developed and implemented effectively for real-world applications such as access control, identity verification, and authentication systems.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the transfer learning problem of hand - image classification between different domains, especially how to prevent the model from experiencing catastrophic forgetting when adapting to new - domain data. Specifically: 1. **Forensic value assessment of hand images**: The paper explores the classification of hand images using Vision Transformers (ViTs). The hand has unique features, such as vein patterns, fingerprints, and hand geometries, which can be used for applications such as identity verification. 2. **Application of interpretability tools**: Through interpretability tools (such as Deep Feature Factorization and Grad - CAM), the influence of internal representations on model outputs is studied to improve the interpretability of the model. 3. **Introduction of knowledge distillation methods**: In order to prevent catastrophic forgetting, an adaptive distillation method is proposed, enabling the student model to extract knowledge from the teacher model without accessing the source - domain data and perform well on new - domain data. 4. **Cross - domain adaptation problem**: When a model is trained on a specific domain (e.g., palmar hand images), its performance on another different domain (e.g., dorsal hand images) will decline significantly. The paper aims to solve this problem and ensure that the model maintains excellent performance in both domains. ### Specific problem description - **Catastrophic forgetting**: When the model is fine - tuned on new - domain data, it is easy to forget the knowledge learned in the source domain. To solve this problem, the paper proposes an adaptive distillation method. - **Differences in different domains**: There are significant differences in features and appearances between palmar and dorsal hand images, resulting in different performances of the model in these two domains. The goal of the paper is to enable the model to adapt to these differences. ### Solutions - **Adaptive distillation method**: - **Method 1**: In the early stage of learning, the student model follows the behavior of the teacher model more closely and gradually deviates as the learning progresses. This helps to stabilize the learning process of the student model. - **Method 2**: Let the student model imitate the internal state of the teacher model, not just the output layer. Through this method, the student model can better replicate the hidden representations of the teacher model. - **Experimental verification**: A series of experiments are carried out using two publicly available hand - image datasets (the IIT Delhi dataset and the 11k Hands dataset) to verify the performance of the ViT model and its proposed adaptive distillation method. ### Summary The main contributions of this paper are: 1. Using the ViT model to classify hand images and comparing its performance with existing methods. 2. Exploring the various components of the ViT model and evaluating their influence on the model output. 3. Proposing an adaptive distillation method, enabling the ViT model to perform well on new - domain data while retaining the source - domain knowledge. These methods are of great significance for practical applications (such as access control, identity verification, and authentication systems).