Abstract:Chinese developers often cannot effectively search questions in English, because they may have difficulties in translating technical words from Chinese to English and formulating proper English queries. For the purpose of helping Chinese developers take advantage of the rich knowledge base of Stack Overflow and simplify the question retrieval process, we propose an automated cross-language relevant question retrieval (CLRQR) system to retrieve relevant English questions for a given Chinese question. CLRQR first extracts essential information (both Chinese and English) from the title and description of the input Chinese question, then performs domain-specific translation of the essential Chinese information into English, and finally formulates an English query for retrieving relevant questions in a repository of English questions from Stack Overflow. We propose three different retrieval algorithms (word-embedding, word-matching, and vector-space-model based methods) that exploit different document representations and similarity metrics for question retrieval. To evaluate the performance of our approach and investigate the effectiveness of different retrieval algorithms, we propose four baseline approaches based on the combination of different sources of query words, query formulation mechanisms and search engines. We randomly select 80 Java, 20 Python and 20 .NET questions in SegmentFault and V2EX (two Chinese Q&A websites for computer programming) as the query Chinese questions. We conduct a user study to evaluate the relevance of the retrieved English questions using CLRQR with different retrieval algorithms and the four baseline approaches. The experiment results show that CLRQR with word-embedding based retrieval achieves the best performance.

Exploiting Syntactic and Semantic Information in Coarse Chinese Question Classification

Chinese Question Answering Based on Syntax Analysis and Answer Classification

A Question-Answering System over Traditional Chinese Medicine

Domain-specific Cross-Language Relevant Question Retrieval.

Structure Analysis and Computation-Based Chinese Question Classification

Coarse-to-Careful: Seeking Semantic-related Knowledge for Open-domain Commonsense Question Answering

Question Classification Using Multiple Classifiers.

Generation of New Type of Question Features Based on Bag-of-Words Binding

Semantic computation in a Chinese Question-Answering system

Chinese Question Answering System-Oriented Chinese Parsing

A Chinese Intelligent Question Answering System Based on Domain Ontology and Sentence Templates

Question Classification via Multiclass Kernel-based Vector Machines

Combined Multiple Classifiers Based on TBL Algorithm and Their Application in Question Classification

Advances in Question CIassification for Open-Domain Question Answering

Using Co-Occurrence Statistics As an Information Source for Partial Parsing of Chinese

Chinese Question Classification Based on ERNIE and Feature Fusion

Exploiting Co-occurrence Opinion Words for Semi-supervised Sentiment Classification.

Exploiting effective features for chinese sentiment classification

Detecting Syntactic Features of Translated Chinese

Chinese Chunking Based on Coarse-Grained Part-Of-Speech Features

Improving Chinese Semantic Role Classification with Hierarchical Feature Selection Strategy.