Abstract:Cyber attacks have become a vital threat to connected autonomous vehicles in intelligent transportation systems. Cyber threat intelligence, as the collection of cyber threat information, provides an ideal approach for responding to emerging vehicle cyber threats and enabling proactive security defense. Obtaining valuable information from enormous cybersecurity data using knowledge extraction technologies to achieve cyber threat intelligence modeling is an effective means to ensure automotive cybersecurity. Unfortunately, there is no existing cybersecurity dataset available for cyber threat intelligence modeling research in the automotive field. This paper reports the creation of a cyber threat intelligence corpus focusing on vehicle cybersecurity knowledge mining. This dataset, annotated using a joint labeling strategy, comprises 908 real automotive cybersecurity reports, containing 3678 sentences, 8195 security entities and 4852 semantic relations. We further conduct a comprehensive analysis of cyber threat intelligence mining algorithms based on this corpus. The proposed dataset will serve as a valuable resource for evaluating the performance of existing algorithms and advancing research in cyber threat intelligence modeling within the automotive field.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of cybersecurity threats faced by Connected Autonomous Vehicles (CAVs) in intelligent transportation systems. Specifically, the paper focuses on how to support cybersecurity knowledge mining and modeling research by constructing a Cyber - Threat Intelligence (CTI) dataset specifically for the automotive field. #### Background and Challenges With the rapid development of connected autonomous vehicles, these vehicles have shown great potential in improving traffic efficiency, reducing congestion, and decreasing the accident rate. However, new cybersecurity risks and vulnerabilities also come along. Hackers can take advantage of these potential attack surfaces and even gain control of the vehicles. In recent years, the frequency, scale, and complexity of cyber - attacks against connected autonomous vehicles have increased exponentially, which may lead to privacy leakage, economic losses, personal injury, and even endanger national security. Existing security measures such as access control, firewalls, Intrusion Detection and Prevention Systems (IDPS), and Security Operations Centers (SOC), although effective, have limitations, such as passive protection and limited threat identification capabilities. Therefore, there is an urgent need for a method that can achieve active defense and timely response to unknown or emerging threats. #### The Role of Cyber - Threat Intelligence (CTI) As a method of collecting cyber - threat information, cyber - threat intelligence provides an ideal way to deal with emerging vehicle cybersecurity threats. By extracting valuable information from a large amount of cybersecurity data and realizing CTI modeling, automotive cybersecurity can be ensured. However, currently, there is a lack of CTI datasets specifically for the automotive field, which limits the progress of related research. #### Main Contributions of the Paper To solve this problem, the author created an automotive CTI dataset named **Acti**, focusing on mining entities related to automotive cybersecurity and their association relationships. This dataset contains 908 real - world automotive cybersecurity reports, involving 3,678 sentences, 8,195 security entities, and 4,852 semantic relationships. These data are annotated using a joint annotation strategy, covering 10 entity concepts and 10 semantic relationship categories, based on the defined automotive CTI ontology model. In addition, the author also conducted a comprehensive analysis of CTI mining algorithms based on this dataset and trained two CTI mining models to verify the reliability of the dataset. This dataset not only fills the gap in the automotive CTI field dataset but also provides a valuable resource for evaluating the performance of existing algorithms and promoting CTI modeling research. ### Key Formulas and Technical Details - **Joint Annotation Format**: A label scheme for annotating entity boundaries, entity types, relationship types, and entity roles. - Entity boundaries use the "BIOES" (Begin, Inside, Other, End, Single) format. - Entity types are divided into 10 categories: Component, Consequence, Identity, Vehicle, Location, Attack Vector, Attack Pattern, Tool, Vulnerability, Course of Action. - Relationship types include: "hasVulnerability", "hasInterface", "hasImpact", "targets", "uses", "mitigates", "related - to", "located - at", "based - on", "consists - of". - **Deep Learning Model**: - **BERT - BiLSTM - att - CRF** model structure: - **Embedding Layer**: Converts the input text into word vector representations. - **Encoder Layer**: Uses a Bidirectional Long - Short - Term Memory Network (BiLSTM) to capture semantic information in the sequence. - **Attention Mechanism Layer**: Introduces a self - attention mechanism to focus on key information. - **Decode

A dataset for cyber threat intelligence modeling of connected autonomous vehicles

Proactive security defense: cyber threat intelligence modeling for connected autonomous vehicles

Data Poisoning Attacks in Internet-of-Vehicle Networks: Taxonomy, State-of-The-Art, and Future Directions.

Cybersecurity challenges in vehicular communications

Investigation of Security Threat Datasets for Intra- and Inter-Vehicular Environments

Simulating Malicious Attacks on VANETs for Connected and Autonomous Vehicle Cybersecurity: A Machine Learning Dataset

In-Vehicle Communication Cyber Security: A Comprehensive Review of Challenges and Solutions

Revisiting Automotive Attack Surfaces: a Practitioners’ Perspective

Automotive Cybersecurity Vulnerability Assessment Using the Common Vulnerability Scoring System and Bayesian Network Model

Combining Cyber Security Intelligence to Refine Automotive Cyber Threats

A Survey on Cyber-Security of Connected and Autonomous Vehicles (CAVs)

Attacks to Automatous Vehicles: A Deep Learning Algorithm for Cybersecurity

Traffic data extraction and labeling for machine learning based attack detection in IoT networks

can-train-and-test: A Curated CAN Dataset for Automotive Intrusion Detection

A Novel Dataset and Approach for Adversarial Attack Detection in Connected and Automated Vehicles

Cyber security analysis of connected vehicles

Cyber Attack Detection for Self-Driving Vehicle Networks Using Deep Autoencoder Algorithms

Cybersecurity for autonomous vehicles: Review of attacks and defense

Roadmap for Cybersecurity in Autonomous Vehicles

A Security Evaluation Framework for Intelligent Connected Vehicles Based on Attack Chains

An Overview of Attacks and Defences on Intelligent Connected Vehicles