Abstract:Chinese developers often cannot effectively search questions in English, because they may have difficulties in translating technical words from Chinese to English and formulating proper English queries. For the purpose of helping Chinese developers take advantage of the rich knowledge base of Stack Overflow and simplify the question retrieval process, we propose an automated cross-language relevant question retrieval (CLRQR) system to retrieve relevant English questions for a given Chinese question. CLRQR first extracts essential information (both Chinese and English) from the title and description of the input Chinese question, then performs domain-specific translation of the essential Chinese information into English, and finally formulates an English query for retrieving relevant questions in a repository of English questions from Stack Overflow. We propose three different retrieval algorithms (word-embedding, word-matching, and vector-space-model based methods) that exploit different document representations and similarity metrics for question retrieval. To evaluate the performance of our approach and investigate the effectiveness of different retrieval algorithms, we propose four baseline approaches based on the combination of different sources of query words, query formulation mechanisms and search engines. We randomly select 80 Java, 20 Python and 20 .NET questions in SegmentFault and V2EX (two Chinese Q&A websites for computer programming) as the query Chinese questions. We conduct a user study to evaluate the relevance of the retrieved English questions using CLRQR with different retrieval algorithms and the four baseline approaches. The experiment results show that CLRQR with word-embedding based retrieval achieves the best performance.

Applying Machine Translation to Two-Stage Cross-Language Information Retrieval

Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

Cross-Language Information Retrieval for Technical Documents

Cross-Lingual Text Image Recognition Via Multi-Task Sequence to Sequence Learning.

Research on Lucene-based English-Chinese Cross-Language Information Retrieval

Research on Chinese-English Cross-Language Information Retrieval

Domain-specific Cross-Language Relevant Question Retrieval.

Exploiting Neural Query Translation into Cross Lingual Information Retrieval

An Application of Machine Translation Technology in Multilingual Information Retrieval

Learning to Exploit Different Translation Resources for Cross Language Information Retrieval

Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

Research On English-Chinese Bi-Directional Cross-Language Information Retrieval

English-Chinese Cross-language Information Retrieval Using Lucene System

Opening Machine Translation Black Box for Cross-Language Information Retrieval

Dictionary-based Method for English-Chinese Bidirectional Cross-Language Information Retrieval

Distillation for Multilingual Information Retrieval

Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation

Cross-Language Information Retrieval Based on Multiple Information

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

Towards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries

Constraint Translation Candidates: A Bridge between Neural Query Translation and Cross-lingual Information Retrieval