Abstract:The widespread use of cameras in everyday life situations generates a vast amount of data that may contain sensitive information about the people and vehicles moving in front of them (location, license plates, physical characteristics, etc). In particular, people's faces are recorded by surveillance cameras in public spaces. In order to ensure the privacy of individuals, face blurring techniques can be applied to the collected videos. In this paper we present two deep-learning based options to tackle the problem. First, a direct approach, consisting of a classical object detector (based on the YOLO architecture) trained to detect faces, which are subsequently blurred. Second, an indirect approach, in which a Unet-like segmentation network is trained to output a version of the input image in which all the faces have been blurred.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to automatically blur faces in videos to ensure the privacy of individuals in public spaces. With the wide application of cameras in daily life, a large amount of data may contain sensitive information about pedestrians (such as location, license plates, appearance features, etc.). In particular, surveillance cameras in public places will record people's face information. To ensure personal privacy, faces can be blurred. Specifically, the paper explores two deep - learning - based methods to achieve this goal: 1. **Direct method**: Use the classic object detector (based on the YOLO architecture) to detect faces in the video and blur the detected faces. 2. **Indirect method**: Train a segmentation network similar to Unet so that all faces in the input image are directly blurred. ### Direct method The direct method relies on the YOLOv5Face model, which is based on the YOLO architecture and is specifically used for face detection. Its workflow is as follows: - **Face detection**: The YOLOv5Face model detects faces in the image and returns the bounding box coordinates. - **Elliptical transformation**: Convert the detected rectangular box into an ellipse to obtain better visual effects. - **Face blurring**: Select an appropriate Gaussian blur standard deviation according to the size of the detected face, blur the entire frame image, and replace the corresponding part in the original image with the blurred image. ### Indirect method The indirect method uses a Unet architecture similar to DeOldify, which was originally used for the colorization task of grayscale images. By training this network, it can directly output the image with blurred faces without first performing face detection. The specific steps include: - **Dataset construction**: Use the FDDB and WIDER FACE datasets to generate the original image and its corresponding blurred version as training samples. - **Network architecture**: Adopt the Unet architecture, use the pre - trained ResNet50 in the encoder part, and the decoder part contains self - attention layers. - **Inference method**: Down - sample the input image and send it to the network, then up - sample the output image back to the original size, and extract the mask through the L1 - norm threshold, and finally replace the blurred face in the original image. ### Experiment and evaluation The paper compares the effects of these two methods through experiments, and mainly evaluates from the following two aspects: 1. **Visual evaluation and face counting**: Check whether the blurring effect is complete and the original person cannot be recognized; count the number of correctly blurred faces in each test image. 2. **Computation time**: Compare the inference times of different methods and evaluate their computational efficiency. In general, this paper aims to explore and compare two deep - learning - based face - blurring techniques to find the most suitable method for automatically protecting privacy.

Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

Learning an Occlusion-Aware Network for Video Deblurring

Automatic Face Anonymization in Visual Data: Are we really well protected?

Verifiable Facial De-Identification in Video Surveillance

The UU-Net: Reversible Face De-Identification for Visual Surveillance Video Footage

Convolutional-based variational autoencoders for face privacy protection in video surveillance

An Automatic System for Unconstrained Video-Based Face Recognition

Learning to Anonymize Faces for Privacy Preserving Action Detection

Image Deblurring Based on a U-shaped Network for Vehicle Surveillance Scenarios

Historical Blurry Video-Based Face Recognition

Learning to Deblur Images with Exemplars

Deblurring Method of Face Recognition AI Technology Based on Deep Learning

A multi-task approach to face deblurring

Deepfake detection in videos with multiple faces using geometric-fakeness features

Learning Blind Motion Deblurring

Privacy-Preserving Robot Vision with Anonymized Faces by Extreme Low Resolution

Multimodal Deepfake Detection for Short Videos

MesoNet: a Compact Facial Video Forgery Detection Network

FAKER: Full-body Anonymization with Human Keypoint Extraction for Real-time Video Deidentification

Coupled Learning for Facial Deblur.

Deep Fake Detection: Survey of Facial Manipulation Detection Solutions