Explorations in Self-Supervised Learning: Dataset Composition Testing for Object Classification

Raynor Kirkson E. Chavez,Kyle Gabriel M. Reynoso
2024-12-01
Abstract:This paper investigates the impact of sampling and pretraining using datasets with different image characteristics on the performance of self-supervised learning (SSL) models for object classification. To do this, we sample two apartment datasets from the Omnidata platform based on modality, luminosity, image size, and camera field of view and use them to pretrain a SimCLR model. The encodings generated from the pretrained model are then transferred to a supervised Resnet-50 model for object classification. Through A/B testing, we find that depth pretrained models are more effective on low resolution images, while RGB pretrained models perform better on higher resolution images. We also discover that increasing the luminosity of training images can improve the performance of models on low resolution images without negatively affecting their performance on higher resolution images.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **The impact of different image features (such as modality, brightness, resolution, and field - of - view angle) on the performance of self - supervised learning (SSL) models in object classification tasks**. Specifically, the author explores this problem in the following aspects: 1. **Impact of dataset composition**: The performance changes of self - supervised learning models when pre - training with datasets having different image characteristics are studied. These characteristics include modality (RGB vs depth), brightness, image size, and camera field - of - view angle, etc. 2. **Performance differences between low - resolution and high - resolution images**: The different performances of deep pre - training models and RGB pre - training models when processing low - resolution and high - resolution images are explored. The results show that the deep pre - training model performs better on low - resolution images, while the RGB pre - training model performs better on high - resolution images. 3. **Impact of brightness adjustment on model performance**: Whether increasing the brightness of training images can improve the model performance on low - resolution images without affecting its performance on high - resolution images is studied. Experiments show that appropriately increasing the brightness can indeed improve the performance on low - resolution images. 4. **Impact of field - of - view angle (FOV)**: The impact of images with different field - of - view angles on model performance is analyzed. The results show that the change of field - of - view angle has little impact on the STL - 10 dataset, but on the CIFAR - 10 dataset, images with a high field - of - view angle perform slightly better. ### Research background Self - supervised learning (SSL) has received extensive attention in the field of machine learning in recent years because it can use a large amount of unlabeled data for training and achieve good results on a variety of tasks through transfer learning. However, the performance of SSL models highly depends on the characteristics of the input data, especially the feature changes in the dataset. Therefore, researchers have begun to pay attention to the impact of different dataset combinations on the performance of SSL models. ### Main findings - **Modality impact**: The deep pre - training model performs better on low - resolution images, while the RGB pre - training model performs better on high - resolution images. - **Brightness impact**: Increasing the brightness can improve the model performance on low - resolution images without affecting the performance on high - resolution images. - **Field - of - view angle impact**: The change of field - of - view angle has a slight positive impact on some datasets (such as CIFAR - 10), but has little impact on other datasets (such as STL - 10). ### Conclusion Through this study, the authors conclude that the composition of the dataset (such as modality, brightness, resolution, and field - of - view angle) significantly affects the performance of self - supervised learning models. Future research can further explore the interactions between these factors to optimize the model performance in different tasks. --- If you need more detailed information or specific formula derivations, please let me know and I will continue to assist you.