Analysis of Tourists' Satisfaction with Scenic Spots

Yu Su,Xinyuan Guo
DOI: https://doi.org/10.25236/AJHSS.2021.040705
2021-08-18
Abstract:This article analyzes the tourist's comments on the scenic spot using TF-IDF, principal component analysis and logistic regression, and obtains the factors that influence tourists' satisfaction with the scenic spot. First, pre-process the data is needed, and then use the precise mode in jieba word segmentation to segment the text, and calculate the top 20 high-frequency words for each scenic spot and hotel. Then merge the 20 popular words of 50 scenic spots (hotels) that were mined together as a data pool, use the TF-IDF algorithm to calculate the feature the lexical item weight is reduced by the kernel principal component method (KernelPCA) to obtain the weight matrix. After that, the data is processed by classification and regression. In terms of classification processing: combine the scenic spot (hotel) score as the classification result and supervised learning using the naive Bayes algorithm, the support vector product machine algorithm, the B P neural network method and the logistic regression method. In terms of regression processing: model evaluation according to the mean squared error (Mean Squared Error, MSE), and finally the classification processing MSE index is better than regression processing.
Business,Computer Science
What problem does this paper attempt to address?