Large Scale Sentiment Analysis with Locality Sensitive BitHash.

Wenhao Zhang,Jianqiu Ji,Jun Zhu,Hua Xu,Bo Zhang
DOI: https://doi.org/10.1007/978-3-319-28940-3_3
2015-01-01
Abstract:As social media data rapidly grows, sentiment analysis plays an increasingly more important role in classifying users' opinions, attitudes and feelings expressed in text. However, most studies have been focused on the effectiveness of sentiment analysis, while ignoring the storage efficiency when processing large-scale high-dimensional text data. In this paper, we incorporate the machine learning based sentiment analysis with our proposed Locality Sensitive One-Bit Min-Hash (BitHash) method. BitHash compresses each data sample into a compact binary hash code while preserving the pairwise similarity of the original data. The binary code can be used as a compressed and informative representation in replacement of the original data for subsequent processing, for example, it can be naturally integrated with a classifier like SVM. By using the compact hash code, the storage space is significantly reduced. Experiment on the popular open benchmark dataset shows that, as the hash code length increases, the classification accuracy of our proposed method could approach the state-of-the-art method, while our method only requires a significantly smaller storage space.
What problem does this paper attempt to address?