Reinforcement Learning in Deep Web Crawling: Survey

Kapil Madan,Rajesh Bhatia
DOI: https://doi.org/10.1007/978-981-16-3346-1_24
2021-09-20
Abstract:Context: Reinforcement learning (RL) can help in solving various challenges of deep web crawling. Deep web content can be accessed by filling the search forms rather than hyperlinks. Understanding the search form and proper selection of queries are necessary steps to retrieve the deep web content successfully. Thus, crawling the deep web is a very challenging task. The reinforcement learning-based technique helps in filling the search form and retrieving the deep web content successfully. RL selects the action based on the given state, and the environment assigns reward/penalty to the selected action. Objective: This study reports a survey of RL-based techniques applied in the domain of deep web crawling. Method: Existing literature survey is based on 31 articles from 77 articles published in various reputed journals, conferences, and workshops. Results: Challenges related to various crawling steps of deep web crawling are presented. RL-based techniques are being used in multiple research papers, which solves deep web crawling challenges. Comparative analysis of RL techniques used in deep web crawling is done based on the strength, metrics, dataset, and research gaps. Conclusion: Various RL-based techniques can be applied to deep web crawling, which has not been explored yet. Open challenges and research directions are also recommended.
What problem does this paper attempt to address?