LEARNING-based Focused WEB Crawler
Naresh Kumar,Dhruv Aggarwal
DOI: https://doi.org/10.1080/03772063.2021.1885312
IF: 1.8768
2021-02-22
IETE Journal of Research
Abstract:As the number of pages being published every day increases enormously, there is a consistent need to design an efficient crawler mechanism that can result in appropriate and efficient search results for the everyday query. Every day people face the problem of inappropriate or incorrect answers among search results. So, there is a strong need to develop enhanced methods to provide precise search results for the user in an acceptable time frame. Through this project, we exhibit an effective approach to building a crawler considering factors that have never been considered before. The main focus of the project would be designing an intelligent crawler that learns itself to improve the effective ranking of URLs using a focused crawler. Moreover, there exist many crawlers which first head to the seed URL, read the pages, and download the pages for further indexing to the search engines. In this, there is a problem that if a website/page which does not update regularly, is still crawled by the crawler even though it had already been downloaded in its previous visit. Due to this, there is a great loss of bandwidth, network, time, and storage. So, we aim to minimize these problems by making an effective system with a revisited policy for web crawlers. First, websites are divided into three categories frequently, frequent, static in the first crawl, and then the crawler decides its time that at what time it has to crawl again for that website.