Sentiment lexicon for cross-domain adaptation with multi-domain dataset in Indian languages enhanced with BERT classification model

K. Suresh Kumar,C. Helen Sulochana,A.S. Radhamani,T. Ananth Kumar
DOI: https://doi.org/10.3233/jifs-220448
2022-09-22
Journal of Intelligent & Fuzzy Systems
Abstract:Many websites are attempting to offer a platform for users or customers to leave their reviews and comments about the products or services in their native languages. The cross-domain adaptation (CDA) analyses sentiment across domains. The sentiment lexicon falls short resulting in issues like feature mismatch, sparsity, polarity mismatch and polysemy. In this research, an augmented sentiment dictionary is developed in our native regional language (Tamil) that intends to construct the contextual links between terms in multi-domain datasets to reduce problems like polarity mismatch, feature mismatch, and polysemy. Data from the source domain and target domain both labeled and unlabeled are used in the proposed dictionary. To be more specific, the initial dictionary uses normalised pointwise mutual information (nPMI) to derive contextual weight, whereas the final dictionary uses the value of terms across all reviews to compute the accurate rank score. Here, a deep learning model called BERT is used for sentiment classification. For cross-domain adaptation, a modified multi-layer fuzzy-based convolutional neural network (M-FCNN) is deployed. This work aims to build a single dictionary using large number of vocabularies for classifying the reviews in Tamil for several target domains. This extendible dictionary enhances the accuracy of CDA greatly when compared to existing baseline techniques and easily handles a large number of terms in different domains.
What problem does this paper attempt to address?