Abstract:This thesis investigates the effectiveness of SimCLR, a contrastive learning technique, in Greek letter recognition, focusing on the impact of various augmentation techniques. We pretrain the SimCLR backbone using the Alpub dataset (pretraining dataset) and fine-tune it on a smaller ICDAR dataset (finetuning dataset) to compare SimCLR's performance against traditional baseline models, which use cross-entropy and triplet loss functions. Additionally, we explore the role of different data augmentation strategies, essential for the SimCLR training process. Methodologically, we examine three primary approaches: (1) a baseline model using cross-entropy loss, (2) a triplet embedding model with a classification layer, and (3) a SimCLR pretrained model with a classification layer. Initially, we train the baseline, triplet, and SimCLR models using 93 augmentations on ResNet-18 and ResNet-50 networks with the ICDAR dataset. From these, the top four augmentations are selected using a statistical t-test. Pretraining of SimCLR is conducted on the Alpub dataset, followed by fine-tuning on the ICDAR dataset. The triplet loss model undergoes a similar process, being pretrained on the top four augmentations before fine-tuning on ICDAR. Our experiments show that SimCLR does not outperform the baselines in letter recognition tasks. The baseline model with cross-entropy loss demonstrates better performance than both SimCLR and the triplet loss model. This study provides a detailed evaluation of contrastive learning for letter recognition, highlighting SimCLR's limitations while emphasizing the strengths of traditional supervised learning models in this task. We believe SimCLR's cropping strategies may cause a semantic shift in the input image, reducing training effectiveness despite the large pretraining dataset. Our code is available at <a class="link-external link-https" href="https://github.com/DIVA-DIA/MT_augmentation_and_contrastive_learning/" rel="external noopener nofollow">this https URL</a>.

Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

A Simple Framework for Contrastive Learning of Visual Representations

Boost Supervised Pretraining for Visual Transfer Learning: Implications of Self-Supervised Contrastive Representation Learning.

Resource and data efficient self supervised learning

Semi-Supervising Learning, Transfer Learning, and Knowledge Distillation with SimCLR

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment.

SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images

DenseCL: A Simple Framework for Self-Supervised Dense Visual Pre-Training

Contrastive Learning for Character Detection in Ancient Greek Papyri

Counterfactual contrastive learning: robust representations via causal image synthesis

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Slimmable Networks for Contrastive Self-supervised Learning

SimCLR-Inception: An Image Representation Learning and Recognition Model for Robot Vision.

With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations

Siamese Prototypical Contrastive Learning

Do Pre-trained Models Benefit Equally in Continual Learning?

Non-Contrastive Learning Meets Language-Image Pre-Training

MixCL: Pixel label matters to contrastive learning

Contrastive learning-based pretraining improves representation and transferability of diabetic retinopathy classification models