News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News

Tarun Jain,Yufei Gao,Sridhar Vanga,Karan Singla

2024-10-10

Abstract:Large Language Models (LLMs) have fast become an essential tools to many conversational chatbots due to their ability to provide coherent answers for varied queries. Datasets used to train these LLMs are often a mix of generic and synthetic samples, thus lacking the verification needed to provide correct and verifiable answers for T.V. News. We collect and share a large collection of QA pairs extracted from transcripts of news recordings from various news-channels across the United States. Resultant QA pairs are then used to fine-tune an off-the-shelf LLM model. Our model surpasses base models of similar size on several open LLM benchmarks. We further integrate and propose a RAG method to improve contextualization of our answers and also point it to a verifiable news recording.

Computation and Language

What problem does this paper attempt to address?

The problem this paper attempts to address is the lack of reliability and accuracy of existing large language models (LLMs) when answering questions related to television news. Although these models perform well in handling various queries, they often prove unreliable in answering news-related questions due to the diversity of training data and lack of validation. Therefore, the authors propose a multilingual LLM framework for broadcast television news—News Reporter, which aims to improve the reliability and accuracy of the model in answering news-related questions by fine-tuning existing LLMs with question-answer pairs extracted from real news records. Specifically, the main contributions of the paper include: 1. **Constructing a high-quality multilingual question-answer pair dataset**: A large number of question-answer pairs were extracted from real news records of multiple news channels, and a vector database was provided to validate each answer. 2. **Proposing an efficient fine-tuning method**: The extracted question-answer pairs were used to fine-tune the existing pre-trained LLM, enabling it to better understand queries, retrieve relevant news records, and generate appropriate answers. 3. **Evaluating model performance**: Through standard LLM benchmarks and custom evaluation sets, the fine-tuned model demonstrated significant improvements in answering broadcast news-related questions. Through these methods, the paper aims to enhance the reliability and accuracy of large language models in handling news-related tasks, particularly in the field of broadcast television news.

News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News

LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content

NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Large language model based framework for knowledgebase coverage and correctness using chatbot and human feedback

Comuniqa : Exploring Large Language Models for improving speaking skills

Supervised Knowledge Makes Large Language Models Better In-context Learners

Developing Story: Case Studies of Generative AI's Use in Journalism

Spoken Language Intelligence of Large Language Models for Language Learning

Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors

Code-mixed LLM: Improve Large Language Models' Capability to Handle Code-Mixing through Reinforcement Learning from AI Feedback

Benchmarking Large Language Models for News Summarization

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

Leveraging LLMs for Dialogue Quality Measurement

Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs

Knowledge Bases in Support of Large Language Models for Processing Web News

Large Language Model Agent for Fake News Detection

Several categories of Large Language Models (LLMs): A Short Survey

AI-Press: A Multi-Agent News Generating and Feedback Simulation System Powered by Large Language Models

Tele-FLM Technical Report

LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!