Abstract:In the field of telecommunications and cloud communications, accurately and in real-time detecting whether a human or an answering machine has answered an outbound call is of paramount importance. This problem is of particular significance during campaigns as it enhances service quality, efficiency and cost reduction through precise caller identification. Despite the significance of the field, it remains inadequately explored in the existing literature. This paper presents an innovative approach to answering machine detection that leverages transfer learning through the YAMNet model for feature extraction. The YAMNet architecture facilitates the training of a recurrent-based classifier, enabling real-time processing of audio streams, as opposed to fixed-length recordings. The results demonstrate an accuracy of over 96% on the test set. Furthermore, we conduct an in-depth analysis of misclassified samples and reveal that an accuracy exceeding 98% can be achieved with the integration of a silence detection algorithm, such as the one provided by FFmpeg.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the **Answering Machine Detection (AMD) problem**. Specifically, it attempts to accurately distinguish in real - time calls whether the call is answered by a real person or by a voice mailbox or an automatic answering machine. This problem is of great significance in the telecommunications and cloud communication platforms, especially in marketing activities, which can significantly improve service quality and efficiency and reduce costs. #### Background and Importance 1. **Real - time and Accuracy**: In marketing activities, if it is possible to quickly and accurately determine whether a call is answered by a real person or a voice mailbox, marketers can play advertising or promotional information only on calls answered by real people, avoiding unnecessary call charges. 2. **Limitations of Existing Solutions**: Although there are currently many proprietary solutions providing AMD functions, there is relatively little publicly available research literature, and these proprietary solutions usually do not disclose their algorithms and technical details. Therefore, there is a lack of transparency and verifiable results. 3. **Application Scenarios**: AMD is not only applied in marketing, but also very important in scenarios such as call centers. For example, call centers can skip calls answered by machines through automatic dialers, thereby improving the work efficiency of human agents. #### Main Contributions of the Paper 1. **Review of Current AMD Solutions**: The paper provides a comprehensive review of existing AMD solutions, including proprietary software and research progress. 2. **Proposing a New Deep - Learning Method**: The paper proposes a new method based on Recurrent Neural Network (RNN). By using transfer learning and the YAMNet model for feature extraction, it realizes the real - time processing of audio streams. 3. **Supporting Modern AMD Features**: The new method not only improves the detection accuracy, but also supports some modern AMD features, such as the mute detection algorithm (using FFmpeg), which further improves the system performance. 4. **Flexibility and Scalability**: This method allows users to adjust the behavior of AMD by setting hyper - parameters (such as timeout time, confidence threshold, and minimum detection time) to adapt to different application scenarios. #### Experimental Results - **Accuracy on the Test Set**: The model achieved an accuracy of 96.67% on the test set. After the optimization of the mute detection module, the accuracy was improved to 98.10%. - **Real - time Processing Ability**: The inference time of the model is very short, with an average inference time of 31.63 milliseconds per frame, which is suitable for real - time applications. #### Conclusions and Future Work The method proposed in the paper not only meets the industry standards in performance, but also provides users with a flexible and easily extensible AMD solution. Future research directions include: - Providing language/region - based data set analysis. - Integrating the mute detection module into the classifier as an additional input. - Exploring the application of data augmentation techniques. - Conducting a direct comparison of different solutions on the same representative data set, considering model accuracy, inference speed, and resource consumption. - Studying how to stably deploy stateful models in the production environment. Through these improvements, this method is expected to be promoted and applied in more practical applications.

A Recurrent Neural Network Approach to the Answering Machine Detection Problem

Automated Call Detection for Acoustic Surveys with Structured Calls of Varying Length

Real-time Caller Intent Detection In Human-Human Customer Support Spoken Conversations

Detecting Interrogative Utterances with Recurrent Neural Networks

A Neural Network-based Howling Detection Method for Real-Time Communication Applications

Advancements in intrusion detection: A lightweight hybrid RNN-RF model

Question Detection from Acoustic Features Using Recurrent Neural Network with Gated Recurrent Unit

DeepDet: YAMNet with BottleNeck Attention Module (BAM) TTS synthesis detection

Acoustic Signal Analysis with Deep Neural Network for Detecting Fault Diagnosis in Industrial Machines

Machine Learning for Detecting Anomalies and Intrusions in Communication Networks

A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling

An Executable Method For An Intelligent Speech And Call Recognition System Using A Machine Learning-Based Approach

Machine Reading Comprehension for Answer Re-Ranking in Customer Support Chatbots

Multi-Transfer Learning Techniques for Detecting Auditory Brainstem Response

Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

Application of machine learning to microseismic event detection in distributed acoustic sensing data

Question Answering System Analysis Based on Machine Learning

An Evaluation on Speech Recognition Technology based on Machine Learning

U Recurrent Neural Network for Polyphonic Sound Event Detection and Localization

An AI-powered Acoustic Detection System Based on YAMNet for UAVs in Search and Rescue Operations

Fake news detection: A hybrid CNN-RNN based deep learning approach