Collection and Analysis of Emotional Speech Focused on the Psychological and Acoustical Diversity

Takahiro Miyajima,H. Kikuchi,K. Shirai
Abstract:How to effectively collect diverse data of sponta-neous speech and to upgrade various speech processing techniques is a serious issue. In this paper, we introduce our proposed method “SEN method”, which aims to collect psychologically and acoustically diverse acted speech effectively with naturalness. We created detailed directions, on the basis of various real-life scenarios, consulting with a professional actress in order to facilitate duplicating diverse expressions for the actor or actress. We compared fifty speech data by the SEN method and fifty others by the legacy method, where simple basic emotional words are used as prompts. In the psychological space, the SEN data filled up low density areas of the space of the legacy method. In order to confirm the causes of this phenomenon, we analyzed the relationship between the psychological and acoustical features. Our results demonstrate the advantage of the SEN method, which is the generation of psychologically diverse speech that cannot be described by representative acoustical features.
What problem does this paper attempt to address?