Regression Guided by Relative Ranking Using Convolutional Neural Network (R<inline-formula><tex-math notation="LaTeX">$^3$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>3</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="jin-ieq1-2933523.gif"/></alternatives></inline-formula

Luojun Lin,Lingyu Liang,Lianwen Jin
DOI: https://doi.org/10.1109/TAFFC.2019.2933523
IF: 13.99
2022-01-01
IEEE Transactions on Affective Computing
Abstract:Facial beauty prediction (FBP) aims to automatically assess facial attractiveness consistently with judgements based on human perception. Most of previous methods formulate FBP as a classification, regression or ranking problem of machine learning. However, humans not only represent facial attractiveness as a score, but also perceive the relative aesthetics of faces. Inspired by this observation, we formulate FBP as a specific regression problem guided by ranking information. Specifically, we propose a general CNN architecture, called R<inline-formula><tex-math notation="LaTeX">$^3$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>3</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="jin-ieq2-2933523.gif"/></alternatives></inline-formula>CNN, to integrate the relative ranking of faces in terms of aesthetics to improve performance of FBP. As R<inline-formula><tex-math notation="LaTeX">$^3$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>3</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="jin-ieq3-2933523.gif"/></alternatives></inline-formula>CNN consists of both regression and ranking components, it is challenging to train and fine-tune it by existing techniques. To tackle this problem, we propose the following learning schemes for R<inline-formula><tex-math notation="LaTeX">$^3$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>3</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="jin-ieq4-2933523.gif"/></alternatives></inline-formula>CNN: 1) a hard pair sampling strategy that generates challenging-to-predicted image pairs and pseudo ranking labels from true rating scores; 2) an assemble loss function that combines regression loss and pairwise ranking loss (PR-Loss), where PR-Loss can be a hinge-form loss or a log-sum-exp pairwise loss; 3) a cascaded fine-tuning method that further improves prediction. Moreover, we build a benchmark dataset, called SCUT-FBP5500, containing 5,500 facial images with diverse properties (male/female, Asian/Caucasian, ages) and labels (face landmarks, rating scores within [1, 5], rating score distribution). Experiments were performed on both the SCUT-FBP and the SCUT-FBP5500 benchmark datasets, where our method achieves state-of-the-art performance on different evaluation settings. Comparisons with related CNN models highlight the effectiveness of the R<inline-formula><tex-math notation="LaTeX">$^3$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>3</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="jin-ieq5-2933523.gif"/></alternatives></inline-formula>CNN architecture for FBP.
What problem does this paper attempt to address?