Abstract:Digital video is widely used to record people's daily lives and share people's moods, but few researchers have conducted research on the consistency of emotional expression between short videos and music. In order to be able to match the appropriate background music to the short video image autonomously and efficiently, the paper analyzed the emotional connection between the two from the audio-visual synesthesia. First, emotional semantics was used as a bridge to connect video data and music data, and a video-music synesthesia data set based on semantic words was constructed. Then, an attention mechanism was incorporated to better extract key features in video images. In the extraction of music features, an improved lenet5 network was used, and the optimal network parameters were determined through experiments. Finally, the two types of features were fused and the mutual retrieval between video and music was performed. In order to compare the performance of different models, different CNN models were calculated in the processing of video images, including VGG16, VGG19, AlexNet and GoogleNet, and the attention mechanism was added to each network for calculation to compare its retrieval accuracy. In the processing of music data, different CNN algorithms were also used for comparative experiments, and networks with different layers were used to determine the optimal results. The experimental results show that the audiovisual synesthesia retrieval model based on emotion can effectively measure the emotional similarity between video images and music, and the method of the paper can produce a good match between them. The research method of the paper is the exploration of computer synesthetic intelligence, which can stimulate the creative inspiration of image and music creative designers. While enhancing the emotional experience of digital products, it also improves the efficiency and quality of development.

Audio scene semantic similarity computing approach

Quantitative similarity computing for audio effect semantic in video content analysis

AudioScenic: Audio-Driven Video Scene Editing

Audio Similarity Measure by Graph Modeling and Matching

Unsupervised Auditory Scene Categorization Via Key Audio Effects And Information-Theoretic Co-Clustering

Inspection of Video Frequency Scene Based on Audio Frequency Analysis

Semantic Similarity Score for Measuring Visual Similarity at Semantic Level

Efficient Video to Audio Mapper with Visual Scene Detection

Improving Semantic Scene Categorization by Exploiting Audio-Visual Features

Audio Segmentation in AAC Domain for Content Analysis

A Mid-Level Scene Change Representation Via Audiovisual Alignment

Research on Emotional Semantic Retrieval of Attention Mechanism Oriented to Audio-visual Synesthesia

Study on Linguistic Computing for Music Emotion

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

Object Segmentation with Audio Context

A Statistics-Based Method For Video Semantic Analysis

Audio-Visual Segmentation with Semantics

Audio and Video Combined for Home Video Abstraction

Audio retrieval based on perceptual similarity

A Two-Stage Content-Based Audio Segmentation Algorithm