Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

Almustafa Abed,Belhassen Akrout,Ikram Amous
DOI: https://doi.org/10.1007/s42979-022-01467-5
2022-11-19
SN Computer Science
Abstract:In recent years, researchers have developed several techniques to accurately count the number of people in a crowded retail environment for human behavior analysis, ranging from vision and feature based approaches to machine-learning and deep learning approaches. Due to the availability and affordability of recent advanced technologies such as depth sensors and high computing powers as well as the availability of big data, the need for building accurate models to detect and count people in crowded environments arises. Deep learning approaches has proven to be highly accurate especially when there is enough data to train the model and high computation powers. People detection and counting is a challenging problem. due to several issues such as occlusion, light variations, complex backgrounds to name a few. To cope with these issues, we utilize a top-view configuration depth data to train our model. We propose a convolutional encoder decoder architecture consisting of a resnet50 encoder trained on the ImageNet dataset as a transfer learning technique and we built the decoder part of the model as a novel contribution, for segmenting and counting customers heads in a crowded retail stores where there are more than 6 individuals per square meter without compromising people privacy because the camera does not record people's faces. The objective of our method is to segment and count people using the publicly available TV-Head Dataset and People Counting DataSet (PCDS). The results demonstrate that our model is robust and can be used for real time people counting with accurate results.
What problem does this paper attempt to address?