Generating and Improving a Dataset of Masked Faces Using Data Augmentation

Waleed Ayad,Siraj Qays,Ali Al-Naji
DOI: https://doi.org/10.51173/jt.v5i2.1140
2023-06-10
Journal of Techniques
Abstract:Before the spread of the COVID-19 virus in 2020, modern face recognition systems performed excellently, but then the wearing of masks was imposed by countries on their population, which led to a noteworthy decrease in the discriminatory ability of those systems, where they had been trained on large-scale datasets of unmasked faces and not available large-scale masked faces datasets that time. To contribute to addressing the shortage of large-scale data sets that consist of people wearing masks, a developed method has been presented to create simulated masks and overlay them on faces in two main steps. The first step was to detect, align and crop the faces of unmasked faces datasets in a dataset and then apply simulated masks on the faces utilizing the dlib-ml library. This method was used to generate a dataset for masked faces (CASIA-mask). The second step used five techniques of data augmentation with the generated dataset. To evaluate the masked dataset and data augmentation, an accuracy of 96.4% was achieved by training one of the latest and most important facial recognition systems, FaceNet, on the masked dataset. The same system also achieved excellent results of 97.71% when trained on CASIA-mask and data augmentation together.
What problem does this paper attempt to address?