Abstract:Chinese developers often cannot effectively search questions in English, because they may have difficulties in translating technical words from Chinese to English and formulating proper English queries. For the purpose of helping Chinese developers take advantage of the rich knowledge base of Stack Overflow and simplify the question retrieval process, we propose an automated cross-language relevant question retrieval (CLRQR) system to retrieve relevant English questions for a given Chinese question. CLRQR first extracts essential information (both Chinese and English) from the title and description of the input Chinese question, then performs domain-specific translation of the essential Chinese information into English, and finally formulates an English query for retrieving relevant questions in a repository of English questions from Stack Overflow. We propose three different retrieval algorithms (word-embedding, word-matching, and vector-space-model based methods) that exploit different document representations and similarity metrics for question retrieval. To evaluate the performance of our approach and investigate the effectiveness of different retrieval algorithms, we propose four baseline approaches based on the combination of different sources of query words, query formulation mechanisms and search engines. We randomly select 80 Java, 20 Python and 20 .NET questions in SegmentFault and V2EX (two Chinese Q&A websites for computer programming) as the query Chinese questions. We conduct a user study to evaluate the relevance of the retrieved English questions using CLRQR with different retrieval algorithms and the four baseline approaches. The experiment results show that CLRQR with word-embedding based retrieval achieves the best performance.

Log-mining-based query spelling correction for Chinese search engines

Query spelling correction for multi-language search engines

Search query correction based on user intent analysis

Error Correction for Search Engine by Mining Bad Case

A large scale ranker-based system for search query spelling correction

Chinese Error Correction of Searching Engine under N-Gram Statistic Model

Domain-specific Cross-Language Relevant Question Retrieval.

Research on Domain-Specific Chinese Spelling Correction Method Based on Plugin Extension Modules

Chinese Spelling Error Detection and Correction Based on Knowledge Graph.

Improving Chinese Spelling Correction by Ranking.

Query Based Chinese Phrase Extraction for Site Search

User Behaviors Lend a Helping Hand: Learning Paraphrase Query Patterns from Search Log Sessions.

Learning Phrase-Based Spelling Error Models from Clickthrough Data.

Is Chinese Spelling Check ready? Understanding the correction behavior in real-world scenarios

Query Expansion by Mining User Logs

Learning Search Tasks in Queries and Web Pages Via Graph Regularization

A Chinese OCR Spelling Check Approach Based on Statistical Language Models.

Query Expansion for Short Queries by Mining User Logs

Chinese Spelling Correction Based on Knowledge Enhancement and Contrastive Learning

An Error-Guided Correction Model for Chinese Spelling Error Correction

A Hybrid Model for Chinese Spelling Check