Abstract:1. INTRODUCTION Reusing information redundancy in question-answer pairs is one of the alternative approaches to question answering (QA) system. If the same question has been asked by other users, the QA system responses to such question using the answer associated with the redundant question. Nevertheless, the task of identifying similarity of questions is not trivial. Traditional text similarity measures are neither effective nor efficient in distinguishing the similarity of sentence-level text. Document similarity techniques are not effective since the length of sentence text is rather short and contains very little word overlap. Furthermore, the similarity and relevance of sentences can be characterized into different levels, which is difference than a standard topicality notion used in document retrieval. In this paper, we focus on the problem of identifying questions that express the same information need. The main goal is to match questions with their paraphrases. To achieve this, we propose a hybrid question similarity approach that combines semantic, syntactic, and question type similarity. Semantic and syntactic information is measured by taking into account word similarity, word ordering, and parts of speech information. Information about the types of question is derived from a Support Vector Machine classifier. The experimental results have shown that our approach is highly effective in detecting redundant questions. For many years, knowledge-sharing community sites, such as Yahoo! Answers, have been accepting a large amount of questions from millions of users. Given the current magnitude of questions and answers in their archive, it is plausible that a newly submitted question has already been asked by the other users. However, finding such similar questions is ineffective due to the inherited limitation of the current search engines. Standard text retrieval approaches that compute the similarity of a document-level text are neither effective nor efficient for matching natural language questions. First, the fundamental principle of document similarity techniques is based on the degree of word overlaps. This notion works well in distinguishing similar documents since they are likely to contain sufficient number of words in common. On the other hand, the length of question phrases is relatively short and often contains very few word overlap. Furthermore, due to the generative power of natural language, the same question can be expressed in various ways. Hence, most questions are likely to receive a low similarity score from document similarity measures. The notion of topical relevance, which is central to the standard information retrieval systems, …

Detecting Duplicate Questions in Stack Overflow Via Semantic and Relevance Approaches

Detecting Duplicate Questions in Stack Overflow Via Source Code Modeling

Detecting Duplicate Questions in Stack Overflow Via Deep Learning Approaches

Multi-Factor Duplicate Question Detection in Stack Overflow

Enhancing User Experience on Q&A Platforms: Measuring Text Similarity Based on Hybrid CNN-LSTM Model for Efficient Duplicate Question Detection

Attention-based model for predicting question relatedness on Stack Overflow

Mining Duplicate Questions of Stack Overflow

Identifying Semantically Duplicate Questions Using Data Science Approach: A Quora Case Study

Negative Results of Image Processing for Identifying Duplicate Questions on Stack Overflow

Interpretable duplicate question detection models based on attention mechanism

Employing Siamese MaLSTM Model and ELMO Word Embedding for Quora Duplicate Questions Detection

Same-Same But Different: On Understanding Duplicates in Stack Overflow

Feature Analysis for Duplicate Detection in Programming QA Communities.

MQDD: Pre-training of Multimodal Question Duplicity Detection for Software Engineering Domain

An Intent-based and Annotation-free Method for Duplicate Question Detection in CQA Forums

Generating Question Titles for Stack Overflow from Mined Code Snippets

Siamese Neural Networks with Random Forest for detecting duplicate question pairs

Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

Deep Learning Approaches to Semantic Relevance Modeling for Chinese Question-Answer Pairs.

Utilizing Sentence Similarity and Question Type Similarity to Response to Similar Questions in Knowledge-Sharing Community

Code2Que: a tool for improving question titles from mined code snippets in stack overflow