On the Effectiveness of Facial Expression Recognition for Evaluation of Urban Sound Perception.
Qi Meng,Xuejun Hu,Jian Kang,Yue Wu
DOI: https://doi.org/10.1016/j.scitotenv.2019.135484
IF: 9.8
2019-01-01
The Science of The Total Environment
Abstract:Sound perception studies mostly depend on questionnaires with fixed indicators. Therefore, it is desirable to explore methods with dynamic outputs. The present study aims to explore the effects of sound perception in the urban environment on facial expressions using a software named FaceReader based on facial expression recognition (FER). The experiment involved three typical urban sound recordings, namely, traffic noise, natural sound, and community sound. A questionnaire on the evaluation of sound perception was also used, for comparison. The results show that, first, FER is an effective tool for sound perception research, since it is capable of detecting differences in participants' reactions to different sounds and how their facial expressions change over time in response to those sounds, with mean difference of valence between recordings from 0.019 to 0.059 (p < 0.05or p < 0.01). In a natural sound environment, for example, facial expression increased by 0.04 in the first 15 s and then went down steadily at 0.004 every 20 s. Second, the expression indices, namely, happy, sad, and surprised, change significantly under the effect of sound perception. In the traffic sound environment, for example, happy decreased by 0.012, sad increased by 0.032, and surprised decreased by 0.018. Furthermore, social characteristics such as distance from living place to natural environment (r = 0.313), inclination to communicate (r = 0.253), and preference for crowd (r = 0.296) have effects on facial expression. Finally, the comparison of FER and questionnaire survey results showed that in the traffic noise recording, valence in the first 20 s best represents acoustic comfort and eventfulness; for natural sound, valence in the first 40 s best represents pleasantness; and for community sound, valence in the first 20 s of the recording best represents acoustic comfort, subjective loudness, and calmness.