Abstract:The topics of visual and audio quality assessment (QA) have been widely researched for decades, yet nearly all of this prior work has focused only on single-mode visual or audio signals. However, visual signals rarely are presented without accompanying audio, including heavy-bandwidth video streaming applications. Moreover, the distortions that may separately (or conjointly) afflict the visual and audio signals collectively shape user-perceived quality of experience (QoE). This motivated us to conduct a subjective study of audio and video (A/V) quality, which we then used to compare and develop A/V quality measurement models and algorithms. The new LIVE-SJTU Audio and Video Quality Assessment (A/V-QA) Database includes 336 A/V sequences that were generated from 14 original source contents by applying 24 different A/V distortion combinations on them. We then conducted a subjective A/V quality perception study on the database towards attaining a better understanding of how humans perceive the overall combined quality of A/V signals. We also designed four different families of objective A/V quality prediction models, using a multimodal fusion strategy. The different types of A/V quality models differ in both the unimodal audio and video quality prediction models comprising the direct signal measurements and in the way that the two perceptual signal modes are combined. The objective models are built using both existing state-of-the-art audio and video quality prediction models and some new prediction models, as well as quality-predictive features delivered by a deep neural network. The methods of fusing audio and video quality predictions that are considered include simple product combinations as well as learned mappings. Using the new subjective A/V database as a tool, we validated and tested all of the objective A/V quality prediction models. We will make the database publicly available to facilitate further research.

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

AQP: An Open Modular Python Platform for Objective Speech and Audio Quality Metrics

A Novel Non-Intrusive Objective Speech Quality Measurement Based On Gmm And Svr

TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

SAQAM: Spatial Audio Quality Assessment Metric

Non-intrusive Objective Speech Quality Measurement Based on GMM and SVR for Narrowband and Wideband Speech

SpeechQoE

ODAQ: Open Dataset of Audio Quality

AMBIQUAL: Towards a Quality Metric for Headphone Rendered Compressed Ambisonic Spatial Audio

Objective Speech Quality Assessment With Non-Intrusive Method For Narrowband Speech

OpenACE: An Open Benchmark for Evaluating Audio Coding Performance

Multi-dimensional Speech Quality Assessment in Crowdsourcing

Study of Subjective and Objective Quality Assessment of Audio-Visual Signals.

Speech quality estimation with deep lattice networks

Signal Quality Auditing for Time-series Data

DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

On Crowdsourcing-design with Comparison Category Rating for Evaluating Speech Enhancement Algorithms

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech