Spontaneous Gestures Encoded by Hand Positions Improve Language Models: an Information-Theoretic Motivated Study.

Yang Xu,Yang Cheng
DOI: https://doi.org/10.18653/v1/2023.findings-acl.600
2023-01-01
Abstract:The multi-modality nature of human communication has been utilized to enhance the performance of language modeling-related tasks.Driven by the development of large-scale endto-end learning techniques and the availability of multi-modal data, it becomes possible to represent non-verbal communication behaviors through joint-learning, and directly study their interaction with verbal communication.However, there are still gaps in existing studies to better address the underlying mechanism of how non-verbal expression contributes to the overall communication purpose.Therefore, we explore two questions using mixedmodal language models trained against monologue video data: first, whether incorporating gesture representations can improve the language model's performance (perplexity); second, whether spontaneous gestures demonstrate entropy rate constancy (ERC), which is an empirical pattern found in most verbal language data that supports the rational communication assumption from Information Theory.We have positive and interesting findings for both questions: speakers indeed use spontaneous gestures to convey "meaningful" information that enhances verbal communication, which can be captured with a simple spatial encoding scheme.More importantly, gestures are produced and organized rationally in a similar way as words, which optimizes communication efficiency.
What problem does this paper attempt to address?