Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

Roman Plaud,Jose-Luis Lisani
2024-09-23
Abstract:The widespread use of cameras in everyday life situations generates a vast amount of data that may contain sensitive information about the people and vehicles moving in front of them (location, license plates, physical characteristics, etc). In particular, people's faces are recorded by surveillance cameras in public spaces. In order to ensure the privacy of individuals, face blurring techniques can be applied to the collected videos. In this paper we present two deep-learning based options to tackle the problem. First, a direct approach, consisting of a classical object detector (based on the YOLO architecture) trained to detect faces, which are subsequently blurred. Second, an indirect approach, in which a Unet-like segmentation network is trained to output a version of the input image in which all the faces have been blurred.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to automatically blur faces in videos to ensure the privacy of individuals in public spaces. With the wide application of cameras in daily life, a large amount of data may contain sensitive information about pedestrians (such as location, license plates, appearance features, etc.). In particular, surveillance cameras in public places will record people's face information. To ensure personal privacy, faces can be blurred. Specifically, the paper explores two deep - learning - based methods to achieve this goal: 1. **Direct method**: Use the classic object detector (based on the YOLO architecture) to detect faces in the video and blur the detected faces. 2. **Indirect method**: Train a segmentation network similar to Unet so that all faces in the input image are directly blurred. ### Direct method The direct method relies on the YOLOv5Face model, which is based on the YOLO architecture and is specifically used for face detection. Its workflow is as follows: - **Face detection**: The YOLOv5Face model detects faces in the image and returns the bounding box coordinates. - **Elliptical transformation**: Convert the detected rectangular box into an ellipse to obtain better visual effects. - **Face blurring**: Select an appropriate Gaussian blur standard deviation according to the size of the detected face, blur the entire frame image, and replace the corresponding part in the original image with the blurred image. ### Indirect method The indirect method uses a Unet architecture similar to DeOldify, which was originally used for the colorization task of grayscale images. By training this network, it can directly output the image with blurred faces without first performing face detection. The specific steps include: - **Dataset construction**: Use the FDDB and WIDER FACE datasets to generate the original image and its corresponding blurred version as training samples. - **Network architecture**: Adopt the Unet architecture, use the pre - trained ResNet50 in the encoder part, and the decoder part contains self - attention layers. - **Inference method**: Down - sample the input image and send it to the network, then up - sample the output image back to the original size, and extract the mask through the L1 - norm threshold, and finally replace the blurred face in the original image. ### Experiment and evaluation The paper compares the effects of these two methods through experiments, and mainly evaluates from the following two aspects: 1. **Visual evaluation and face counting**: Check whether the blurring effect is complete and the original person cannot be recognized; count the number of correctly blurred faces in each test image. 2. **Computation time**: Compare the inference times of different methods and evaluate their computational efficiency. In general, this paper aims to explore and compare two deep - learning - based face - blurring techniques to find the most suitable method for automatically protecting privacy.