A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation

Yiping Zhang,Yuntao Shou,Tao Meng,Wei Ai,Keqin Li
2024-07-23
Abstract:The age estimation task aims to use facial features to predict the age of people and is widely used in public security, marketing, identification, and other fields. However, the features are mainly concentrated in facial keypoints, and existing CNN and Transformer-based methods have inflexibility and redundancy for modeling complex irregular structures. Therefore, this paper proposes a Multi-view Mask Contrastive Learning Graph Convolutional Neural Network (MMCL-GCN) for age estimation. Specifically, the overall structure of the MMCL-GCN network contains a feature extraction stage and an age estimation stage. In the feature extraction stage, we introduce a graph structure to construct face images as input and then design a Multi-view Mask Contrastive Learning (MMCL) mechanism to learn complex structural and semantic information about face images. The learning mechanism employs an asymmetric siamese network architecture, which utilizes an online encoder-decoder structure to reconstruct the missing information from the original graph and utilizes the target encoder to learn latent representations for contrastive learning. Furthermore, to promote the two learning mechanisms better compatible and complementary, we adopt two augmentation strategies and optimize the joint losses. In the age estimation stage, we design a Multi-layer Extreme Learning Machine (ML-IELM) with identity mapping to fully use the features extracted by the online encoder. Then, a classifier and a regressor were constructed based on ML-IELM, which were used to identify the age grouping interval and accurately estimate the final age. Extensive experiments show that MMCL-GCN can effectively reduce the error of age estimation on benchmark datasets such as Adience, MORPH-II, and LAP-2016.
Computer Vision and Pattern Recognition,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use facial features to predict people's ages more accurately in the age estimation task. Existing methods based on Convolutional Neural Networks (CNN) and Transformers have problems of insufficient flexibility and redundancy when dealing with complex and irregularly - structured data. Therefore, this paper proposes a Multi - View Masked Contrastive Learning Graph Convolutional Neural Network (MMCL - GCN) to improve the accuracy of age estimation. ### Main contributions of the paper: 1. **Proposed a new image feature extraction method**: Combined with the Multi - View Masked Contrastive Learning (MMCL) framework, it can effectively integrate contrastive learning and masked image reconstruction, thereby learning the latent discriminative features and high - level local features of the image. 2. **Designed a Multi - Layer Extreme Learning Machine (ML - IELM)**: By introducing the identity mapping, classifiers and regressors are constructed to reduce the error in age estimation. 3. **Experimental verification**: Extensive experiments were carried out on three popular benchmark datasets (MORPH - II, LAP - 2016 and Adience), and the results show that MMCL - GCN outperforms existing comparison algorithms in terms of mean absolute error and normal score (Ļµ - error). ### Method overview: - **Feature extraction stage**: - **Graph structure construction**: Convert the input two - dimensional image into a graph structure, where each image block is a node of the graph. - **Multi - View Masked Contrastive Learning (MMCL)**: Utilize an asymmetric Siamese network architecture, reconstruct the masked information through an online encoder - decoder structure, and perform contrastive learning through a target encoder. Optimize the reconstruction loss and the contrastive learning objective to obtain a robust and efficient online encoder. - **Age estimation stage**: - **Multi - Layer Extreme Learning Machine (ML - IELM)**: Use the facial features extracted by the online encoder to perform preliminary age grouping and final age prediction through the multi - layer extreme learning machine. Specifically, use the ML - IELM classifier for preliminary age grouping, and then use the ML - IELM regressor for final age prediction. ### Formula representation: - **Basic expression of Graph Convolutional Neural Network (GCN)**: \[ H^{l + 1}=\sigma\left(\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} H^{l} \Theta^{l}\right) \] where \(\tilde{A}=A + I\), \(I\) is the identity matrix, \(\tilde{D}\) is the degree matrix, \(H^{0}\) is the feature vector of the input layer, \(H^{l}\) is the node embedding of the \(l\)-th layer, \(\Theta^{l}\) is the trainable weight, and \(\sigma\) is the activation function. - **Graph feature aggregation and update**: \[ a_{v}^{(k)}=\text{AGG}^{(k)}\left(\{h_{v'}^{(k - 1)}: v' \in k(v)\}\right) \] \[ h_{v}^{(k)}=\text{COM}^{(k)}\left(h_{v}^{(k - 1)}, a_{v}^{(k)}\right) \] where \(a_{v}^{(k)}\) is the feature embedding of the aggregated neighbor nodes, \(h_{v}^{(k)}\) is the node embedding of the \(k\)-th layer, \(k(v)\) is the neighbor set of node \(v\), and \(\text{AGG}^{(k)}\) and \(\text{COM}^{(k)}\) are the aggregation function and activation function of the graph neural network layer respectively. - **Multi - Layer Perceptron (MLP) for downstream tasks**: \[ h_{G}=\text{READOUT}\left(\{h_{v}^{(k)}: v \in G\}\right) \] \[ z_{G}=\text{MLP}(h_{G}) \] - **Parameter update**: \[ \Theta