Question Classification in Question Answering Based on Real-World Web Data Sets

袁晓洁,于士涛,师建兴,陈秋双
DOI: https://doi.org/10.3969/j.issn.1003-7985.2008.03.005
2008-01-01
Abstract:To improve question answering (QA) performance based on real-world web data sets, a new set of question classes and a general answer re-ranking model are defined. With predefined dictionary and grammatical analysis, the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features, including the question word, the main verb of the question, the dependency structure, the position of the main auxiliary verb, the main noun of the question, the top hypernym of the main noun, etc. Then the QA query results are re-ranked by question class information. Experiments show that the questions in real-world web data sets can be accurately classified by the classifier, and the QA results after re-ranking can be obviously improved. It is proved that with both semantic and grammatical information, applications such as QA, built upon real-world web data sets, can be improved, thus showing better performance.
What problem does this paper attempt to address?