Abstract:This thesis investigates the effectiveness of SimCLR, a contrastive learning technique, in Greek letter recognition, focusing on the impact of various augmentation techniques. We pretrain the SimCLR backbone using the Alpub dataset (pretraining dataset) and fine-tune it on a smaller ICDAR dataset (finetuning dataset) to compare SimCLR's performance against traditional baseline models, which use cross-entropy and triplet loss functions. Additionally, we explore the role of different data augmentation strategies, essential for the SimCLR training process. Methodologically, we examine three primary approaches: (1) a baseline model using cross-entropy loss, (2) a triplet embedding model with a classification layer, and (3) a SimCLR pretrained model with a classification layer. Initially, we train the baseline, triplet, and SimCLR models using 93 augmentations on ResNet-18 and ResNet-50 networks with the ICDAR dataset. From these, the top four augmentations are selected using a statistical t-test. Pretraining of SimCLR is conducted on the Alpub dataset, followed by fine-tuning on the ICDAR dataset. The triplet loss model undergoes a similar process, being pretrained on the top four augmentations before fine-tuning on ICDAR. Our experiments show that SimCLR does not outperform the baselines in letter recognition tasks. The baseline model with cross-entropy loss demonstrates better performance than both SimCLR and the triplet loss model. This study provides a detailed evaluation of contrastive learning for letter recognition, highlighting SimCLR's limitations while emphasizing the strengths of traditional supervised learning models in this task. We believe SimCLR's cropping strategies may cause a semantic shift in the input image, reducing training effectiveness despite the large pretraining dataset. Our code is available at <a class="link-external link-https" href="https://github.com/DIVA-DIA/MT_augmentation_and_contrastive_learning/" rel="external noopener nofollow">this https URL</a>.

SimCLR-Inception: An Image Representation Learning and Recognition Model for Robot Vision.

A Simple Framework for Contrastive Learning of Visual Representations

AI Online Filters to Real World Image Recognition

Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

Improving the Generalization of Visual Classification Models Across IoT Cameras via Cross-modal Inference and Fusion

SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality

TimeCLR: A self-supervised contrastive learning framework for univariate time series representation

Incremental Model Enhancement Via Memory-based Contrastive Learning

Robot target recognition using deep federated learning

Contrastive Learning for Character Detection in Ancient Greek Papyri

SimMIM: A Simple Framework for Masked Image Modeling

SiSL-Net: Saliency-guided self-supervised learning network for image classification

Improved Inception-Residual Convolutional Neural Network for Object Recognition

SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization.

Facial expression recognition based on improved depthwise separable convolutional network

LRFE-CL: A self-supervised fusion network for infrared and visible image via low redundancy feature extraction and contrastive learning

Face-Inception-Net for Recognition

A Convolutional Neural Network Face Recognition Method Based on BiLSTM and Attention Mechanism

Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression Recognition

Siamese Image Modeling for Self-Supervised Vision Representation Learning

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding