Abstract:In contemporary times, people rely heavily on the internet and search engines to obtain information, either directly or indirectly. However, the information accessible to users constitutes merely 4% of the overall information present on the internet, which is commonly known as the surface web. The remaining information that eludes search engines is called the deep web. The deep web encompasses deliberately hidden information, such as personal email accounts, social media accounts, online banking accounts, and other confidential data. The deep web contains several critical applications, including databases of universities, banks, and civil records, which are off-limits and illegal to access. The dark web is a subset of the deep web that provides an ideal platform for criminals and smugglers to engage in illicit activities, such as drug trafficking, weapon smuggling, selling stolen bank cards, and money laundering. In this article, we propose a search engine that employs deep learning to detect the titles of activities on the dark web. We focus on five categories of activities, including drug trading, weapon trading, selling stolen bank cards, selling fake IDs, and selling illegal currencies. Our aim is to extract relevant images from websites with a ".onion" extension and identify the titles of websites without images by extracting keywords from the text of the pages. Furthermore, we introduce a dataset of images called Darkoob, which we have gathered and used to evaluate our proposed method. Our experimental results demonstrate that the proposed method achieves an accuracy rate of 94% on the test dataset.

Research on Deep Web Classification Based on Domain Feature Text

DEEP WEB DATA SOURCES CLASSIFICATION BASED ON TEXT VSM OF QUERY INTERFACE

Web Page Classification Based on Heterogeneous Features and a Combination of Multiple Classifiers.

Knowledge-based Document Embedding for Cross-Domain Text Classification

Effective Approach to Deep Web Entries Identification

A Machine Learning Approach Classification of Deep Web Sources

Improving short text classification using public search engines

Attributes extraction of Deep Web query interface based on DOM

Text Categorization Based on Domain Ontology

Text Classification: A Perspective of Deep Learning Methods

Combining Topic Models and String Kernel for Deep Web Categorization

Pattern Matching Method for Deep Web Interface Integration

Dark Web Activity Classification Using Deep Learning

Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning

Research on Chinese Text Classification Based on WAE and SVM

Short Text Classification Based on Strong Feature Thesaurus

A Survey on Deep Text Matching

Study and System Implementation of Chinese Web-page Classification

A Survey on Text Classification: From Traditional to Deep Learning

A Survey on Text Classification: From Shallow to Deep Learning

Comparative Study between Traditional Machine Learning and Deep Learning Approaches for Text Classification