Amelioration of linguistic semantic classifier with sentiment classifier manacle for the focused web crawler

K. S. Sakunthala Prabha,C. Mahesh,Sam Goundar,S. P. Raja
DOI: https://doi.org/10.1007/s41870-022-01139-w
2022-12-27
International Journal of Information Technology
Abstract:Sentiment relevant information in the web pages concerning products, establishment, and commodities concentrates principally on the available textual contents. Research on crawling topic-relevant web pages is far behind compared to sentiment-relevant web pages despite the steep rise in sentiment-relevant information on the web. This paper resolves the impediment issues and proposes a novel focused web crawler namely the Linguistic Semantic Sentiment (LSS) crawler which collects not only topic-relevant web pages but also sentiment-relevant web pages. Two classifiers are proposed in the relevance computation module of the LSS crawler, where one is a linguistic semantic classifier and the other is a sentiment classifier. The linguistic semantic classifier computes the semantic relevance of the web page concerning the topic, whereas the sentiment classifier computes the sentiment relevance of the web page. The performance of the LSS crawler is then analyzed by using the metrics, harvest rate, target recall, and F1-score. The LSS crawler outperformed the existing focused crawlers with an average harvest rate of 0.35, target recall of 0.55, and F1-score of 0.42. The evaluation results revealed that both the linguistic semantic and the sentiment classifiers enhanced the performance of the proposed LSS-focused crawler.
What problem does this paper attempt to address?