SpyDark: Surface and Dark Web Crawler

Ashwini Dalvi,Swapneel Paranjpe,Riddhi Amale,Sarvesh Kurumkar,Faruk Kazi,S.G. Bhirud
DOI: https://doi.org/10.1109/icsccc51823.2021.9478098
2021-05-21
Abstract:The researchers established that there exists a portion of the hidden web known as the "Dark Web". The nature of the Dark web intrigued researchers to collect information and present inferences about the information spread and communication taking place on this dark side of the internet. The presented work SpyDark is one such attempt to collect information from the surface and dark web. The user is required to enter the search query, the number of web pages to visit, and specify which network to be accessed (surface or dark web). The crawler traverses all the web pages and extracts information on that page, such as text data, images, hyperlinks, etc. The hyperlinks are stored in the database and later visited by the crawler. The text data is fed to a pre-trained NLP (Natural Language Processing) model, and the output is used to quantify the relevance score. The page is categorized as relevant or irrelevant.
What problem does this paper attempt to address?