Abstract:Malicious domains are part of the landscape of the internet but are becoming more prevalent and more dangerous to both companies and individuals. They can be hosted on variety of technologies and serve an array of content, ranging from Malware, command and control, and complex Phishing sites that are designed to deceive and expose. Tracking, blocking and detecting such domains is complex, and very often involves complex allow or deny list management or SIEM integration with open-source TLS fingerprinting techniques. Many fingerprint techniques such as JARM and JA3 are used by threat hunters to determine domain classification, but with the increase in TLS similarity, particularly in CDNs, they are becoming less useful. The aim of this paper is to adapt and evolve open-source TLS fingerprinting techniques with increased features to enhance granularity, and to produce a similarity mapping system that enables the tracking and detection of previously unknown malicious domains. This is done by enriching TLS fingerprints with HTTP header data and producing a fine grain similarity visualisation that represented high dimensional data using MinHash and local sensitivity hashing. Influence was taken from the Chemistry domain, where the problem of high dimensional similarity in chemical fingerprints is often encountered. An enriched fingerprint was produced which was then visualised across three separate datasets. The results were analysed and evaluated, with 67 previously unknown malicious domains being detected based on their similarity to known malicious domains and nothing else. The similarity mapping technique produced demonstrates definite promise in the arena of early detection of Malware and Phishing domains.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: With the wide adoption of the TLS (Transport Layer Security) protocol and the development of CDN (Content Delivery Network) technology, the existing TLS fingerprinting techniques have become less precise in detecting malicious domains, especially in the TLS feature environment with high similarity. Therefore, the paper proposes a new TLS - based fingerprinting method, combining feature expansion and similarity mapping, to improve the ability to detect malicious domains. ### Specific Problem Description 1. **Limitations of TLS Fingerprinting Techniques** - Existing TLS fingerprinting techniques such as JARM and JA3 rely on hash values and are vulnerable to minor configuration changes, resulting in false positives or false negatives. - In the CDN environment, multiple domains share the same TLS configuration, making it difficult for traditional fingerprinting techniques to distinguish between malicious and benign domains. 2. **Challenges in Malicious Domain Detection** - Malicious domains are increasingly using TLS encryption and CDN technology to hide their true identities, increasing the difficulty of detection. - Existing detection methods are mostly reactive, relying on third - party reports or open - source tools, lacking initiative and real - time performance. ### Goals of the Paper The paper aims to improve TLS fingerprinting techniques in the following ways: - **Increasing Feature Granularity**: By expanding the TLS feature set and introducing HTTP header data to improve the fineness of fingerprinting. - **Introducing Similarity Mapping**: Using techniques such as Locally Sensitive Hashing (LSH) and MinHash to construct similarity mappings of high - dimensional data, thereby better detecting unknown malicious domains. - **Improving Detection Ability**: Especially in the CDN environment, by enhancing feature similarity mapping, improve the detection accuracy of malicious domains. ### Main Contributions 1. **Literature Review**: Evaluate existing active scanning techniques and analyze their advantages and disadvantages. 2. **Design and Development**: Propose and implement a new TLS fingerprinting method, combining feature expansion and similarity mapping. 3. **Experimental Verification**: Verify the effectiveness of the new method through actual experiments and compare it with other existing methods. 4. **Result Analysis**: Analyze the experimental results in detail, summarize the advantages and disadvantages of the new method, and put forward improvement suggestions. Through these improvements, the paper hopes to detect and prevent the spread of malicious domains at an early stage, especially in the context of the wide application of TLS encryption and CDN technology.

A novel TLS-based Fingerprinting approach that combines feature expansion and similarity mapping

TLS fingerprint for encrypted malicious traffic detection with attributed graph kernel

A Fingerprint Enhancement and Second-Order Markov Chain Based Malicious Encrypted Traffic Identification Scheme

Detecting Malignant TLS Servers Using Machine Learning Techniques

Machine learning interpretability meets TLS fingerprinting

Unsupervised Detection and Clustering of Malicious TLS Flows

Active TLS Stack Fingerprinting: Characterizing TLS Server Deployments at Scale

Adaptive Webpage Fingerprinting from TLS Traces

Deciphering Malware's use of TLS (without Decryption)

Unveiling Web Fingerprinting in the Wild Via Code Mining and Machine Learning

Detecting Coordinated Internet-Wide Scanning by TCP/IP Header Fingerprint

Counterfeit Fingerprint Detection of Outbound HTTP Traffic with Graph Edit Distance

A survey of methods for encrypted network traffic fingerprinting

Unmasking phishers: ML for malicious certificate detection

Convolutional neural network-based identification of malicious traffic for TLS encryption

Fingerprinting Internet DNS Amplification DDoS Activities

Detecting Malicious Domains with Behavioral Modeling and Graph Embedding

A Novel Framework for Malicious Encrypted Traffic Classification at Host Level and Flow Level.

A Federated Learning Approach for Multi-stage Threat Analysis in Advanced Persistent Threat Campaigns

HTTPSmell: A Deep Learning Approach on Malicious HTTP Traffic Detection via Data Augmentation and Label Refactoring

Joint Detection of Malicious Domains and Infected Clients