Using Hybrid Kernel Method for Question Classification in CQA

Fan Shixi,Wang Xiaolong,Wang Xuan,Yang Xiaohong
DOI: https://doi.org/10.1007/978-3-642-24965-5_14
2011-01-01
Abstract:A new question classification approach is presented for questions in CQA (Community Question and answering Systems). In CQA, most of the questions are non-factoid questions and can hardly be classified according to their answer types as factoid questions. A rough grained category is introduced and Multi-label classification method is used for question classification. That is, a question can belong to several categories instead of a specific one and the classification result is a category set. A two-step strategy is used for question Multi-label classification. In the first step, series binary classifiers of each question category are used separately. In the second step, results of those classifiers are combined and a set of question category is given as classification result. A hybrid kernel model, which combines tree kernel and polynomial kernel, is used for each binary classifier. A data set with 22000 questions is built and 20000 of which is used as training data, other 2000 as test data. Experiment result shows that the hybrid model is effective. A question paraphrase recognition experiment is carried on to verify the effectiveness of multi-label classification. The experiment results show that Multi-label classification is better than Single-label classification for questions in CQA.
What problem does this paper attempt to address?