Towards Lexical Analysis of Dog Vocalizations via Online Videos

Yufei Wang,Chunhao Zhang,Jieyi Huang,Mengyue Wu,Kenny Zhu
2023-09-22
Abstract:Deciphering the semantics of animal language has been a grand challenge. This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics. We first present a new dataset of Shiba Inu sounds, along with contextual information such as location and activity, collected from YouTube with a well-constructed pipeline. The framework is also applicable to other animal species. Based on the analysis of conditioned probability between dog vocalizations and corresponding location and activity, we discover supporting evidence for previous heuristic research on the semantic meaning of various dog sounds. For instance, growls can signify interactions. Furthermore, our study yields new insights that existing word types can be subdivided into finer-grained subtypes and minimal semantic unit for Shiba Inu is word-related. For example, whimper can be subdivided into two types, attention-seeking and discomfort.
Sound,Computation and Language,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand the lexical semantics of canine languages and explore the minimal semantic units. Specifically, researchers analyze the types of Shiba Inu voices in different situations to reveal the specific meanings carried by these voices. The goals of the paper include: 1. **Determine whether canines use consistent voice patterns to express specific meanings**: Researchers hope to understand whether canines will use specific voice patterns to convey specific information in different scenarios. 2. **Calculate the correlation between voice expressions and factors that may cause different meanings**: This involves how to quantify and analyze the relationship between voices and environmental factors (such as location and activity) to reveal the meanings behind the voices. To answer these questions, researchers have proposed the following technical challenges: - **Classification of voice types**: Define a "word" as an independent and continuous canine voice segment, usually lasting about 1 second, and segment the "word" by detecting the transitions between silent frames and dog - voice frames in the audio. - **Extraction of context information**: Define a diverse and comprehensive list of locations and activities, and use corresponding extraction methods to obtain the specific context information, including location and activity, when each segment of voice occurs. Through the above methods, researchers have constructed a large - scale time - stamp - aligned dataset containing quadruples of <word, sub - word, location, activity> for in - depth analysis of the lexical semantics of canine languages and their minimal semantic units. This research not only provides new insights into canine languages but also provides an extensible data - processing framework for similar research in the future.