Abstract:A cochlear implant (CI) is a surgically implanted electronic device that partially restores hearing to people suffering from profound hearing loss. Although CI users, in general, obtain a very good reception of continuous speech in the absence of background noise, they face severe limitations in the context of music perception and appreciation. The main reasons for these limitations are related to channel interactions created by the broad spread of electrical fields in the cochlea and to the low number of electrodes that stimulate it. Moreover, CIs have severe limitations when it comes to transmitting the temporal fine structure of acoustic signals, and hence, these devices elicit poor pitch and timber perception. For these reasons, several signal processing algorithms have been proposed to make music more accessible for CI users, trying to reduce the complexity of music signals or remixing them to enhance certain components, such as the lead singing voice. In this work, a deep neural network that performs real-time audio source separation to remix music for CI users is presented. The implementation is based on multi-layer perception (MLP) and has been evaluated using objective instrumental measurements to ensure clean source estimation. Furthermore, experiments in 10 normal hearing (NH) and 13 CI users to investigate how the vocals to instruments ratio (VIR) set by the tested listeners were affected in realistic environments with and without visual information. The objective instrumental results fulfill the benchmark reported in previous studies by introducing distortions that are shown to not be perceived by CI users. Moreover, the implemented model was optimized to perform real-time source separation. The experimental results show that CI users prefer vocals 8 dB enhanced with the respect to the instruments independent of acoustic sound scenarios and visual information. In contrast, NH listeners did not prefer a VIR different than zero dB.

Remixing Music with Visual Conditioning

Mind Band: A Crossmedia AI Music Composing Platform

Design and Evaluation of a Real-Time Audio Source Separation Algorithm to Remix Music for Cochlear Implant Users

A versatile deep-neural-network-based music preprocessing and remixing scheme for cochlear implant listeners

Listen and Look: Audio–Visual Matching Assisted Speech Source Separation

A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

Remixing-based Unsupervised Source Separation from Scratch

Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis

Self-Remixing: Unsupervised Speech Separation via Separation and Remixing

Self-Supervised Audio-Visual Soundscape Stylization

Conditional Generation of Audio from Video via Foley Analogies

Audio Conditioning for Music Generation via Discrete Bottleneck Features

Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls

Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Music source separation conditioned on 3D point clouds

MUSIC REMIXING AND UPMIXING USING SOURCE SEPARATION

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

Vision-Infused Deep Audio Inpainting

Music Separation Enhancement with Generative Modeling

Remixing Music for Hearing Aids Using Ensemble of Fine-Tuned Source Separators