Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution

Akshat Dewan,Michal Ziemski,Henri Meylan,Lorenzo Concina,Bruno Pouliquen
DOI: https://doi.org/10.48550/arXiv.2309.15609
2023-09-27
Abstract:This paper presents an end-to-end solution for the creation of fully automated conference meeting transcripts and their machine translations into various languages. This tool has been developed at the World Intellectual Property Organization (WIPO) using in-house developed speech-to-text (S2T) and machine translation (MT) components. Beyond describing data collection and fine-tuning, resulting in a highly customized and robust system, this paper describes the architecture and evolution of the technical components as well as highlights the business impact and benefits from the user side. We also point out particular challenges in the evolution and adoption of the system and how the new approach created a new product and replaced existing established workflows in conference management documentation.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of automatically generating verbatim records and their machine translations in international multilingual conferences. Specifically, the World Intellectual Property Organization (WIPO) holds many multilingual conferences each year, which require the speeches to be recorded as verbatim reports and translated into multiple languages. Traditional methods of manually generating verbatim reports and translations are costly and time-consuming, often taking several months. Therefore, this paper proposes an end-to-end automated solution that utilizes internally developed Speech-to-Text (S2T) and Machine Translation (MT) technologies, capable of generating high-quality automatic verbatim records and their translations shortly after the conference ends. This not only significantly reduces processing time but also lowers costs and improves user experience. The main contributions of the paper include: 1. **Data Collection and Model Training**: Detailed description of how to collect and process multilingual conference data, and how to train and optimize S2T and MT models. 2. **System Architecture and Integration**: Introduction of the overall system architecture and technical components, including audio segmentation, text alignment, data augmentation, and other steps. 3. **Business Impact and User Feedback**: Evaluation of the new system's business impact, including improvements in user satisfaction, cost savings, and user experience. 4. **Future Work Directions**: Discussion of potential directions for further improving the system, such as enhancing the accuracy of domain-specific terminology and increasing the number of supported languages. Through this solution, WIPO not only improves the efficiency and quality of conference records but also provides other international organizations with a reference for technology and methods.