Abstract:Automatic analysis and summarization of affective behaviors and personality from human-human interactions are becoming a central theme in many research areas including computer and social sciences and psychology. Affective behaviors are defined as short- term states, which are very brief in duration, arise in response to an event or situation that are relevant and are rapidly change over time. They include empathy, anger, frustration, satisfaction, and dissatisfaction. Personality is defined as individual's longer-term characteristics that are stable over time and that describe individual's true nature. The stable personality traits have been captured in psychology by the Big-5 model that includes the following traits: openness, conscientiousness, extraversion, agreeableness and neuroticism. Traditional approaches towards measuring behavioral information and personality use either observer- or self- assessed questionnaires. Observers usually monitor the overt signals and label interactional scenarios, whereas self-assessors evaluate what they perceive from the interactional scenarios. Using this measured behavioral and personality information, a typical descriptive summary is designed to improve domain experts' decision-making processes. However, such a manual approach is time-consuming and expensive. Thus it motivated us to the design of automated computational models. Moreover, the motivation of studying affective behaviors and personality is to design a behavioral profile of an individual, from which one can understand/predict how an individual interprets or values a situation. Therefore, the aim of the work presented in this dissertation is to design automated computational models for analyzing affective behaviors such as empathy, anger, frustration, satisfaction, and dissatisfaction and Big-5 personality traits using behavioral signals that are expressed in conversational interactions. The design of the computational models for decoding affective behaviors and personality is a challenging problem due to the multifaceted nature of behavioral signals. During conversational interactions, many aspects of these signals are expressed and displayed by overt cues in terms of verbal and vocal non-verbal expressions. These expressions also vary depending on the type of interaction, context or situation such as phone conversations, face-to-machine, face-to-face, and social media interactions. The challenges of designing computational models require the investigation of 1) different overt cues expressed in several experimental contexts in real settings, 2) verbal and vocal non-verbal expressions in terms of linguistic, visual, and acoustic cues, and 3) combining the information from multiple channels such as linguistic, visual, and acoustic information. Regarding the design of computational models of affective behaviors, the contributions of the work presented here are 1. analysis of the call centers' conversations containing agents' and customers' speech, 2. addressing of the issues related to the segmentation and annotation by defining operational guidelines to annotate empathy of the agent and other emotional states of the customer on real call center data, 3. demonstration of how different channels of information such as acoustic, linguistic, and psycholinguistic channels can be combined to improve for both conversation- level and segment-level classification tasks, and 4. development of a computational pipeline for designing affective scenes, i.e., the emotional sequence of the interlocutors, from a dyadic conversation. In designing models for Big-5 personality traits, we addressed two important problems; personality recognition, which infers self-assessed personality, and personality perception, which infers personalities that observers attribute to an individual. The contributions of this work to personality research are 1. investigation of several scenarios such as broadcast news, human-human spoken conversations from a call center, social media posts such as Facebook status updates and multi-modal youtube blogs, 2. design of classification models using acoustic, linguistic and psycholinguistic features, and 3. investigation of several feature-level and decision-level combination strategies. Based on studies conducted in this work it is demonstrated that fusion of various sources of information is beneficial for designing automated computational models. The computational models for affective behaviors and personality that are presented here are fully automated and effective - they do not require any human intervention. The outcome of this research is potentially relevant for contributing to the automatic analysis of human interactions in several sectors such as customer care, education, and healthcare.

IMPACT OF VISUAL MODALITIES IN MULTIMODAL PERSONALITY AND AFFECTIVE COMPUTING

Integrating audio and visual modalities for multimodal personality trait recognition via hybrid deep learning

Multimodal Emotion Recognition by Combining Physiological Signals and Facial Expressions: a Preliminary Study.

Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition

Multimodal Affective State Assessment Using fNIRS + EEG and Spontaneous Facial Expression

Multimodal Affective Computing Based on Weighted Linear Fusion

Multimodal Video-based Apparent Personality Recognition Using Long Short-Term Memory and Convolutional Neural Networks

Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings

Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning

A Comprehensive Multimodal Humanoid System for Personality Assessment Based on the Big Five Model

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions

Fusion in Context: A Multimodal Approach to Affective State Recognition

Temporal aggregation of audio-visual modalities for emotion recognition

A Novel Emotion-Aware Method Based on the Fusion of Textual Description of Speech, Body Movements, and Facial Expressions

Facial Expression and Peripheral Physiology Fusion to Decode Individualized Affective Experience

Multimodal analysis of personality traits on videos of self-presentation and induced behavior

Computational Models for Analyzing Affective Behaviors and Personality from Speech and Text

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism