Abstract:1. INTRODUCTION Reusing information redundancy in question-answer pairs is one of the alternative approaches to question answering (QA) system. If the same question has been asked by other users, the QA system responses to such question using the answer associated with the redundant question. Nevertheless, the task of identifying similarity of questions is not trivial. Traditional text similarity measures are neither effective nor efficient in distinguishing the similarity of sentence-level text. Document similarity techniques are not effective since the length of sentence text is rather short and contains very little word overlap. Furthermore, the similarity and relevance of sentences can be characterized into different levels, which is difference than a standard topicality notion used in document retrieval. In this paper, we focus on the problem of identifying questions that express the same information need. The main goal is to match questions with their paraphrases. To achieve this, we propose a hybrid question similarity approach that combines semantic, syntactic, and question type similarity. Semantic and syntactic information is measured by taking into account word similarity, word ordering, and parts of speech information. Information about the types of question is derived from a Support Vector Machine classifier. The experimental results have shown that our approach is highly effective in detecting redundant questions. For many years, knowledge-sharing community sites, such as Yahoo! Answers, have been accepting a large amount of questions from millions of users. Given the current magnitude of questions and answers in their archive, it is plausible that a newly submitted question has already been asked by the other users. However, finding such similar questions is ineffective due to the inherited limitation of the current search engines. Standard text retrieval approaches that compute the similarity of a document-level text are neither effective nor efficient for matching natural language questions. First, the fundamental principle of document similarity techniques is based on the degree of word overlaps. This notion works well in distinguishing similar documents since they are likely to contain sufficient number of words in common. On the other hand, the length of question phrases is relatively short and often contains very few word overlap. Furthermore, due to the generative power of natural language, the same question can be expressed in various ways. Hence, most questions are likely to receive a low similarity score from document similarity measures. The notion of topical relevance, which is central to the standard information retrieval systems, …

Sentence Similarity Metric and Its Application in FAQ System

Design and Implementation of FAQ Automatic Return System Based on Similarity Computation

Improve Semantic Web Services Discovery Through Similarity Search in Metric Space

Effects of Distance Information Between the Term and the Central Term on the Similar Question Matching

Syntactic Impact On Sentence Similarity Measure In Archive-Based Qa System

Question Matching Based on Fuzzy Set

Sentence Similarity Computation in Question Answering Robot

Utilizing Sentence Similarity and Question Type Similarity to Response to Similar Questions in Knowledge-Sharing Community

Improved Semantic Similarity Computation In Question-Answering System

Chinese Sentence Similarity Based on Word Context and Semantic

Semantic Similarity Metric and Its Application in Text Classification

Sentence Similarity Computation Based on Feature Set

A Model for Chinese Sentence Similarity Computing

Utilizing Semantic, Syntactic, And Question Category Information For Automated Digital Reference Services

Passage retrieval for web-based question answering

Sentence similarity computing based on multi-features fusion

Query similarity computing based on system similarity measurement

Chinese Sentence Similarity Based on Multi-feature Combination

Information Extraction and Similarity Computation for Semi-/Un-Structured Sentences from the Cyberdata

SSMT:A Machine Translation Evaluation View to Paragraph-to-Sentence Semantic Similarity

QUESTION ANSWERING QUALITY EVALUATION FOR COMMUNITY QUESTION ANSWERING BASED ON SIMILARITY