Learning Multiple Level Features for Opinion Analysis

Ruifeng Xu,Chunyu Kit
DOI: https://doi.org/10.1142/s1793840611002401
2011-01-01
International Journal of Computer Processing Of Languages
Abstract:In recent years, there has been an increasing interest in opinion analysis which seeks to identify and classify opinions in the text automatically. Many existing opinion analysis techniques use opinion lexicon as the most important features but lack the learning capacity of new opinion words and especially, the contextual behaviors of opinion words. Additionally, many works employ intra-sentence features while the contextual intersentence level feature is not well studied. This paper presents an opinion analysis technique which incorporates multiple level intra-sentence features from punctuation-, word- to collocation-level and inter-sentence feature. Based on the systematic observation and analysis on the opinion corpus, some linguistic clues and corresponding features for opinion analysis are discovered. These features are incorporated in a support vector machine (SVM) based classifier to identify opinionated sentences from running text and determine their polarities. To overcome the barrier caused by insufficient annotated training data, a semi-supervised learning algorithm is designed to train the classifier and enrich the opinion-related knowledge by using high quality instances identified from large raw text as new training data interactively. Evaluations on the datasets of NTCIR-6 opinion analysis task (OAT-6) and NTCIR-7 multilingual opinion analysis task (MOAT-7) show that the proposed opinion analysis approach achieved encouraging performance.
What problem does this paper attempt to address?