Abstract:As one of the research hotspots in the field of speech recognition, content-based speech retrieval algorithms can detect speech information with the same content features, which improves computer intelligence while reducing labor costs, and thus have been widely used. Although most of the current speech content retrieval algorithms can guarantee excellent retrieval performance for small-scale speech retrieval work, the performance of the above algorithms is greatly reduced under the constraints of large speech data storage space and high content redundancy. In order to solve the above problems, a high-performance speech BioHashing retrieval algorithm based on audio segmentation is proposed in this paper. The algorithm is divided into an offline pre-processing phase and an online retrieval phase, The offline pre-processing stage converts the speech data into BioHashing sequences with speech content characteristics. In this process, first of all, the Power-Normalized Cepstral Coefficients (PNCC) features of the speech data are extracted and biometric templates with single mapping keys are constructed according to the PNCC features, obtaining BioHashing sequences. Then, slice the original speeches into short-time audio segments according to the proposed audio segmentation algorithm, and the hash reconstruction operation is performed on the BioHashing sequences to obtain the reconstructed Hashing sequences for online retrieval. The online search phase responds to the users' query requests, just find the hash index that matches the query hash sequence from the BioHashing index table, and will the standardized editing distance (SED) to the closest 1 value corresponding to the hash index as the retrieval result back to the user. The experimental results show that the reconstructed hash sequences obtained after removing the silent redundant segments have better robustness and discrimination. Moreover, the algorithm achieves 100% retrieval accuracy for the original speech clips, and the average retrieval time is only 0.0157 s, which shows that the algorithm has good retrieval performance and can meet the needs of speech retrieval in various environments.

A pitch-based rapid speech segmentation for speaker indexing

Real-time Speaker Recognition System for PDA

Pitch envelope based frame level score reweighed algorithm for emotion robust speaker recognition.

Efficient Identification Of Speakers In News Video Based On Shot Segmentation

Speaker Segmentation and Clustering Based on the Improved Spectral Clustering

Multi-speaker Segmentation and Clustering of Telephone Speech

Speaker Segmentation Using Deep Speaker Vectors For Fast Speaker Change Scenarios

UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection

An Improved Speaker Based Speech Segmentation Algorithm

Using confidence measures to evaluate the speaker turns in speaker segmentation

A Robust and Low Computational Cost Pitch Estimation Method

A high-performance speech BioHashing retrieval algorithm based on audio segmentation

A new DP-like speaker clustering algorithm

Research on real-time detection technology of Chinese voiced speech pitch

PGSS: Pitch-Guided Speech Separation.

Efficient Audio Stream Segmentation Via the Combined T-2 Statistic and Bayesian Information Criterion

A Two-Stage Content-Based Audio Segmentation Algorithm

Speaker Segmentation Based on Between-Window Correlation over Speakers' Characteristics

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Assessing Segmental Impact for Objective Speech Quality Evaluation.

Scalable Identity-Oriented Speech Retrieval