Abstract:With the development of the Internet and technology, online music platforms and music streaming services are flourishing. Information overload due to an abundance of digital music has become a common problem for many users. Social tags that are helpful for music recommendations have been discussed. However, label sparsity and a cold start problem, commonly observed with social tags, limit the effectiveness in supporting the recommendation system. A music autotagging system then becomes an alternative solution for supplementing a shortage of tags. Most prior studies on automatic labeling used only audio data for their analysis. However, some studies have suggested that lyrics enhance the music classification system to obtain more information and improve the overall accuracy. In addition to lyrics, audio data are also an important resource for finding music features. In summary, this paper proposes a music autotagging system that relies on both audio and lyrics to solve the above problems. Due to the development of deep learning algorithms in recent years, many scholars have effectively used neural networks to extract audio and textual features. Some of them also considered a structure of lyrics to extract features that consequentially improves the classification task. For lyric feature extraction, this study employs two types of deep learning models: convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The feature extraction architecture is mainly motivated and characterized by the lyric architecture. In addition, a multitask learning method is adopted to learn correlations between tags. The experiments support that a multitask learning classifier that combines audio and lyric information has a better performance than a single-task learning classification method using only audio data than previous studies.

Music autotagging as captioning

audeosynth: music-driven video montage

A method of music autotagging based on audio and lyrics

Automatic Music Emotion Classification Using a New Classification Algorithm

ALCAP: Alignment-Augmented Music Captioner

LP-MusicCaps: LLM-Based Pseudo Music Captioning

Event Localization in Music Auto-tagging

Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial Training

Automatic music emotion classification model for movie soundtrack subtitling based on neuroscientific premises

Unaligned Supervision For Automatic Music Transcription in The Wild

Evaluation of CNN-based Automatic Music Tagging Models

Music auto-tagging in the long tail: A few-shot approach

Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts

Hierarchical Attentive Deep Neural Networks for Semantic Music Annotation Through Multiple Music Representations

Perceptual Musical Features for Interpretable Audio Tagging

MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers

Joint Music and Language Attention Models for Zero-shot Music Tagging

Automated Audio Captioning with Recurrent Neural Networks

Semantic Music Annotation by Label-Specific Conditional Random Fields

Collective Annotation of Music from Multiple Semantic Categories

Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation