Abstract:This paper addresses the problem of keyword extraction from conversations, with the goal of using these keywords to retrieve, for each short conversation fragment, a small number of potentially relevant documents, which can be recommended to participants. However, even a short fragment contains a variety of words, which are potentially related to several topics; moreover, using an automatic speech recognition (ASR) system introduces errors among them. Therefore, it is difficult to infer precisely the information needs of the conversation participants. We first propose an algorithm to extract keywords from the output of an ASR system (or a manual transcript for testing), which makes use of topic modeling techniques and of a submodular reward function which favors diversity in the keyword set, to match the potential diversity of topics and reduce ASR noise. Then, we propose a method to derive multiple topically separated queries from this keyword set, in order to maximize the chances of making at least one relevant recommendation when using these queries to search over the English Wikipedia. The proposed methods are evaluated in terms of relevance with respect to conversation fragments from the Fisher, AMI, and ELEA conversational corpora, rated by several human judges. The scores show that our proposal improves over previous methods that consider only word frequency or topic similarity, and represents a promising solution for a document recommender system to be used in conversations.

Keyword Extraction And Headline Generation Using Novelword Features

Exploring simultaneous keyword and key sentence extraction: improve graph-based ranking using wikipedia.

Exploring Simultaneous Keyword and Key Sentence Extraction

Wikipedia Based Approach for Clustering Keyword of Reviews.

Single Document Keyword Extraction for Internet News Articles

Keyword Extraction Based on Tf/idf for Chinese News Document

A User-Oriented Special Topic Generation System for Digital Newspaper.

News-oriented Automatic Chinese Keyword Indexing

Keyword extraction and clustering for document recommendation in conversations

Cross Domain Search by Exploiting Wikipedia.

Keyword extraction using support vector machine

Keyword Extraction using the Word Co-occurrence Network Properties that is Independent of Languages and Document Types and Its Evaluation by Prediction of Headline Words

Exploiting Wikipedia Priori Knowledge for Chinese Named Entity Recognition

Keyword-propagation-based information enriching and noise removal for web news videos.

A Novel Keyword Generation Model Based on Topic-Aware and Title-Guide

Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages

Chinese Keyword Extraction Algorithm Based on Neighbour Words

A novel approach for building Domain-specific Lexical Repository with Chinese Wikipedia

Improving Keyphrase Extraction Using Wikipedia Semantics

Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words

Exploring Wikipedia and query log's ability for text feature representation