Seeing and hearing what has not been said; A multimodal client behavior classifier in Motivational Interviewing with interpretable fusion

Lucie Galland,Catherine Pelachaud,Florian Pecune
2023-09-27
Abstract:Motivational Interviewing (MI) is an approach to therapy that emphasizes collaboration and encourages behavioral change. To evaluate the quality of an MI conversation, client utterances can be classified using the MISC code as either change talk, sustain talk, or follow/neutral talk. The proportion of change talk in a MI conversation is positively correlated with therapy outcomes, making accurate classification of client utterances essential. In this paper, we present a classifier that accurately distinguishes between the three MISC classes (change talk, sustain talk, and follow/neutral talk) leveraging multimodal features such as text, prosody, facial expressivity, and body expressivity. To train our model, we perform annotations on the publicly available AnnoMI dataset to collect multimodal information, including text, audio, facial expressivity, and body expressivity. Furthermore, we identify the most important modalities in the decision-making process, providing valuable insights into the interplay of different modalities during a MI conversation.
Machine Learning,Artificial Intelligence,Computation and Language,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the automatic classification of client speech in Motivational Interviewing (MI). Specifically, the goal of the paper is to develop a multimodal classifier that can accurately distinguish among three MISC categories (change talk, sustain talk, and follow/neutral talk), which are used to evaluate the quality of MI conversations. The paper trains the model by leveraging multiple modal features such as text, prosody, facial expressions, and body expressions to improve the accuracy of classification, and particularly emphasizes the interpretability of the model, that is, being able to identify the specific modalities that play a crucial role in the decision - making process, thereby providing valuable insights into the interaction of different modalities in MI conversations. The main contributions of the paper are: 1. Developed a MISC classifier using three different modalities (text, prosody, and non - verbal behavior). 2. Proposed a classifier that can identify specific modalities that play a crucial role in the decision - making process, a feature that enables practitioners to understand the reasons why the classifier makes specific decisions. Through these contributions, the paper aims to improve the quality assessment method of MI conversations, making it more efficient, accurate, and applicable in real - time or human - machine conversation environments.