Abstract:In contemporary times, people rely heavily on the internet and search engines to obtain information, either directly or indirectly. However, the information accessible to users constitutes merely 4% of the overall information present on the internet, which is commonly known as the surface web. The remaining information that eludes search engines is called the deep web. The deep web encompasses deliberately hidden information, such as personal email accounts, social media accounts, online banking accounts, and other confidential data. The deep web contains several critical applications, including databases of universities, banks, and civil records, which are off-limits and illegal to access. The dark web is a subset of the deep web that provides an ideal platform for criminals and smugglers to engage in illicit activities, such as drug trafficking, weapon smuggling, selling stolen bank cards, and money laundering. In this article, we propose a search engine that employs deep learning to detect the titles of activities on the dark web. We focus on five categories of activities, including drug trading, weapon trading, selling stolen bank cards, selling fake IDs, and selling illegal currencies. Our aim is to extract relevant images from websites with a ".onion" extension and identify the titles of websites without images by extracting keywords from the text of the pages. Furthermore, we introduce a dataset of images called Darkoob, which we have gathered and used to evaluate our proposed method. Our experimental results demonstrate that the proposed method achieves an accuracy rate of 94% on the test dataset.

Discovery and Classification Model for Deep Web Sources

Research on Deep Web Classification Based on Domain Feature Text

DEEP WEB DATA SOURCES CLASSIFICATION BASED ON TEXT VSM OF QUERY INTERFACE

A Machine Learning Approach Classification of Deep Web Sources

Web Page Classification Based on Heterogeneous Features and a Combination of Multiple Classifiers.

Effective Approach to Deep Web Entries Identification

Research on Web Mining Technique Facing Electronic Business and Application

An Ontology-based Approach to Topic-specific Web Resource Discovery

A survey of search technologies in Deep Web

Research on the Modeling of Semantic-Based Web Resources Feature.

Intelligent Search on Integrated Knowledge Base of Traditional Chinese Medicine

Locality Sensitive Hashing Based Service Classification

Study and System Implementation of Chinese Web-page Classification

Combining Topic Models and String Kernel for Deep Web Categorization

Research on Network Traffic Classification Based on Machine Learning and Deep Learning

Pattern Matching Method for Deep Web Interface Integration

Web mining: knowledge discovery on the Web

A Search Engine Click Model Based on Deep Neural Network

Chinese Web-page Classification Study

Towards Next Generation Web Information Retrieval

Dark Web Activity Classification Using Deep Learning