Analyzing Consumer Reviews for Understanding Drivers of Hotels Ratings: An Indian Perspective

Subhasis Dasgupta,Soumya Roy,Jaydip Sen
2024-08-08
Abstract:In the internet era, almost every business entity is trying to have its digital footprint in digital media and other social media platforms. For these entities, word of mouse is also very important. Particularly, this is quite crucial for the hospitality sector dealing with hotels, restaurants etc. Consumers do read other consumers reviews before making final decisions. This is where it becomes very important to understand which aspects are affecting most in the minds of the consumers while giving their ratings. The current study focuses on the consumer reviews of Indian hotels to extract aspects important for final ratings. The study involves gathering data using web scraping methods, analyzing the texts using Latent Dirichlet Allocation for topic extraction and sentiment analysis for aspect-specific sentiment mapping. Finally, it incorporates Random Forest to understand the importance of the aspects in predicting the final rating of a user.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The aim of this paper is to address the problem of extracting key factors that influence consumer ratings by analyzing consumer reviews of hotels in India. Specifically, the research objectives include the following aspects: 1. **Data Collection and Preprocessing**: Over 44,000 public reviews for 186 hotels in different regions of India were scraped from the TripAdvisor website. These reviews were preprocessed by removing unnecessary punctuation, numbers, and rare words, and by performing lemmatization to extract word roots. 2. **Topic Modeling**: The Latent Dirichlet Allocation (LDA) method was used to perform topic modeling on the preprocessed text, with the aim of automatically identifying the main aspects or topics that influence consumer ratings. 3. **Sentiment Analysis**: A DistilBERT-based deep learning model was employed to perform sentiment analysis on the sentences in each review, obtaining sentiment scores for each sentence to understand consumers' attitudes towards different topics. 4. **Classification Modeling**: Random Forest, Logistic Regression, Xgboost, and LightGBM classification models were constructed to understand how sentiment scores for different topics affect users' final ratings. 5. **Feature Importance Analysis**: The impact of different topics on user ratings was assessed using Xgboost's gain scores and SHAP values, revealing which factors most influence consumer satisfaction. In summary, the goal of this paper is to explore and quantify which aspects of consumer reviews (such as room quality, location, service, etc.) most influence consumers' overall evaluation of hotels in India. This is crucial for hotel managers as it can help them better understand customer needs and preferences, thereby improving service quality and increasing customer satisfaction.