Abstract:Audio features such as inharmonicity, noisiness, and spectral roll-off have been identified as correlates of "noisy" sounds. However, such features are likely involved in the experience of multiple semantic timbre categories of varied meaning and valence. This paper examines the relationships of stimulus properties and audio features with the semantic timbre categories raspy/grainy/rough, harsh/noisy, and airy/breathy. Participants (n = 153) rated a random subset of 52 stimuli from a set of 156 approximately 2-s orchestral instrument sounds representing varied instrument families (woodwinds, brass, strings, percussion), registers (octaves 2 through 6, where middle C is in octave 4), and both traditional and extended playing techniques (e.g., flutter-tonguing, bowing at the bridge). Stimuli were rated on the three semantic categories of interest, as well as on perceived playing exertion and emotional valence. Correlational analyses demonstrated a strong negative relationship between positive valence and perceived physical exertion. Exploratory linear mixed models revealed significant effects of extended technique and pitch register on valence, the perception of physical exertion, raspy/grainy/rough, and harsh/noisy. Instrument family was significantly related to ratings of airy/breathy. With an updated version of the Timbre Toolbox (R-2021 A), we used 44 summary audio features, extracted from the stimuli using spectral and harmonic representations, as input for various models built to predict mean semantic ratings for each sound on the three semantic categories, on perceived exertion, and on valence. Random Forest models predicting semantic ratings from audio features outperformed Partial Least-Squares Regression models, consistent with previous results suggesting that non-linear methods are advantageous in timbre semantic predictions using audio features. Relative Variable Importance measures from the models among the three semantic categories demonstrate that although these related semantic categories are associated in part with overlapping features, they can be differentiated through individual patterns of audio feature relationships.

The language of sounds unheard: Exploring musical timbre semantics of large language models

With Ears to See and Eyes to Hear: Sound Symbolism Experiments with Multimodal Large Language Models

Modeling Noise-Related Timbre Semantic Categories of Orchestral Instrument Sounds With Audio Features, Pitch Register, and Instrument Family

Large language models predict human sensory judgments across six modalities

Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People

Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre

The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?

A blind spot for large language models: Supradiegetic linguistic information

Can Large Language Models Understand Spatial Audio?

Large language models and linguistic intentionality

Large Linguistic Models: Analyzing theoretical linguistic abilities of LLMs

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance

Can large language models help augment English psycholinguistic datasets?

A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning

Evaluation of pretrained language models on music understanding

Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal