Abstract:In this paper, the residual convolutional neural network is used to extract the note features in the music score image to solve the problem of model degradation; then, multiscale feature fusion is used to fuse the feature information of different levels in the same feature map to enhance the feature representation ability of the model. A network composed of a bidirectional simple loop unit and a chained time series classification function is used to identify notes, parallelizing a large number of calculations, thereby speeding up the convergence speed of training, which also makes the data in the dataset no longer need to be strict with labels. Alignment also reduces the requirements on the dataset. Aiming at the problem that the existing cross-modal retrieval methods based on common subspace are insufficient for mining local consistency within modalities, a cross-modal retrieval method fused with graph convolution is proposed. The K-nearest neighbor algorithm is used to construct modal graphs for samples of different modalities, and the original features of samples from different modalities are encoded through a symmetric graph convolutional coding network and a symmetric multilayer fully connected coding network, and the encoded features are fused and input. We jointly optimize the intramodal semantic constraints and intermodal modality-invariant constraints in the common subspace to learn highly locally consistent and semantically consistent common representations for samples from different modalities. The error value of the experimental results is used to illustrate the effect of parameters such as the number of iterations and the number of neurons on the network. In order to more accurately illustrate that the generated music sequence is very similar to the original music sequence, the generated music sequence is also framed, and finally the music sequence spectrogram and spectrogram are generated. The accuracy of the experiment is illustrated by comparing the spectrogram and the spectrogram, and genre classification predictions are also performed on the generated music to show that the network can generate music of different genres.

A Deep Neural Network for Modeling Music

Modeling of the Latent Embedding of Music using Deep Neural Network

Music Waveform Analysis Based on SOM Neural Network and Big Data

Construction of Music Intelligent Creation Model Based on Convolutional Neural Network

Hierarchical Attentive Deep Neural Networks for Semantic Music Annotation Through Multiple Music Representations

Representations of Sound in Deep Learning of Audio Features from Music

Music Style Classification Algorithm Based on Music Feature Extraction and Deep Neural Network

Mode-conditioned music learning and composition: a spiking neural network inspired by neuroscience and psychology

N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

Robust Multipitch Estimation Of Piano Sounds Using Deep Spiking Neural Networks

Music Feature Extraction and Classification Algorithm Based on Deep Learning

Multimodel Music Emotion Recognition Using Unsupervised Deep Neural Networks

Music Similarity Detection Guided by Deep Learning Model

Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model

Deep Neural Network for Musical Instrument Recognition using MFCCs

Model-Based Deep Learning for Music Information Research

Audio-Based Music Classification with DenseNet And Data Augmentation

Deep convolutional neural networks for predominant instrument recognition in polyphonic music

BLNN:a muscular and tall architecture for emotion prediction in music

Music Recommendation System and Recommendation Model Based on Convolutional Neural Network

Research Based on the Application and Exploration of Artificial Intelligence in the Field of Traditional Music