Abstract:Strong encryption algorithms and reliable anonymity routing have made cybercrime investigation more challenging. Hence, one option for law enforcement agencies (LEAs) is to search through unencrypted content on the Internet or anonymous communication networks (ACNs). The capability of automatically harvesting web content from web servers enables LEAs to collect and preserve data prone to serve as potential leads, clues, or evidence in an investigation. Although scientific studies have explored the field of web crawling soon after the inception of the web, few research studies have thoroughly scrutinised web crawling on the "dark web", or ACNs, such as I2P, IPFS, Freenet, and Tor. The current paper presents a systematic literature review (SLR) that examines the prevalence and characteristics of dark web crawlers. From a selection of 58 peer-reviewed articles mentioning crawling and the dark web, 34 remained after excluding irrelevant articles. The literature review showed that most dark web crawlers were programmed in Python, using either Selenium or Scrapy as the web scraping library. The knowledge gathered from the systematic literature review was used to develop a Tor-based web crawling model into an already existing software toolset customised for ACN-based investigations. Finally, the performance of the model was examined through a set of experiments. The results indicate that the developed crawler was successful in scraping web content from both clear and dark web pages, and scraping dark marketplaces on the Tor network. The scientific contribution of this paper entails novel knowledge concerning ACN-based web crawlers. Furthermore, it presents a model for crawling and scraping clear and dark websites for the purpose of digital investigations. The conclusions include practical implications of dark web content retrieval and archival, such as investigation clues and evidence, and related future research topics.

SpyDark: Surface and Dark Web Crawler

Dark Web Illegal Activities Crawling and Classifying Using Data Mining Techniques

CRATOR: a Dark Web Crawler

Dark Web Activity Classification Using Deep Learning

Prevalence of hypothyroidism among Arabs with rheumatoid arthritis.

A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence

Forensic investigation of the dark web on the Tor network: pathway toward the surface web

Exploring Dark Web Crawlers: A Systematic Literature Review of Dark Web Crawlers and Their Implementation

Plunge into the Underworld: A Survey on Emergence of Darknet

Design and Implementation of Domain based Semantic Hidden Web Crawler

A Big Data Architecture for Early Identification and Categorization of Dark Web Sites

Zooming Into the Darknet: Characterizing Internet Background Radiation and its Structural Changes

Darknet Data Mining -- A Canadian Cyber-crime Perspective

Web Crawler and Web Crawler Algorithms: A Perspective

A Comparative Study of Hidden Web Crawlers

PDD Crawler: A focused web crawler using link and content analysis for relevance prediction

Dark Web Marketplaces: Data for Collaborative Threat Intelligence

A general and modular framework for dark web analysis

A novel design of hidden web crawler using ontology

The Language of Legal and Illegal Activity on the Darknet