STRisk: A Socio-Technical Approach to Assess Hacking Breaches Risk

Hicham Hammouchi,Narjisse Nejjari,Ghita Mezzour,Mounir Ghogho,Houda Benbrahim
DOI: https://doi.org/10.1109/TDSC.2022.3149208
2024-11-19
Abstract:Data breaches have begun to take on new dimensions and their prediction is becoming of great importance to organizations. Prior work has addressed this issue mainly from a technical perspective and neglected other interfering aspects such as the social media dimension. To fill this gap, we propose STRisk which is a predictive system where we expand the scope of the prediction task by bringing into play the social media dimension. We study over 3800 US organizations including both victim and non-victim organizations. For each organization, we design a profile composed of a variety of externally measured technical indicators and social factors. In addition, to account for unreported incidents, we consider the non-victim sample to be noisy and propose a noise correction approach to correct mislabeled organizations. We then build several machine learning models to predict whether an organization is exposed to experience a hacking breach. By exploiting both technical and social features, we achieve a Area Under Curve (AUC) score exceeding 98%, which is 12% higher than the AUC achieved using only technical features. Furthermore, our feature importance analysis reveals that open ports and expired certificates are the best technical predictors, while spreadability and agreeability are the best social predictors.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the prediction and risk assessment of data leakage (especially data leakage caused by hacking). Specifically, the authors point out that although existing research has mainly explored this issue from a technical perspective, they have ignored the impact of other important social factors such as social media on the risk of data leakage. To make up for this deficiency, this paper proposes a new prediction system - STRisk, which combines the technical and social media dimensions to more comprehensively assess the risk of data leakage in organizations caused by hacking. ### Main Problems and Solutions 1. **Limitations of Existing Research**: - Existing work mainly focuses on technical indicators, such as improper network management, malicious activities, etc., while ignoring social factors such as social media. - Ignoring unreported data leakage events, which may lead to the problem of noisy labels in the data set (that is, some non - victim samples are actually victims but not reported). 2. **Introduction of the STRisk System**: - **Expanding the Prediction Range**: By introducing the social media dimension (such as information on Twitter), the scope of the prediction task is expanded. - **Handling Noisy Labels**: In response to unreported data leakage events, a noisy label correction method is proposed to improve the accuracy of the prediction model. - **Feature Extraction**: A comprehensive feature set containing technical and social media signals is constructed to assess the technical security posture and social reputation of the organization. 3. **Experimental Results**: - By using technical and social media features, the STRisk system has achieved a significant performance improvement in predicting data leakage, with an AUC score exceeding 98%, which is 12% higher than using only technical features. - Feature importance analysis shows that open ports and expired certificates are the best technical predictors, while spreadability and consistency are the best social predictors. ### Summary The core objective of this paper is to provide a more comprehensive and accurate data leakage risk assessment and prediction tool by combining the technical and social media dimensions. The introduction of the STRisk system not only fills the gaps in existing research but also provides organizations with a more comprehensive risk perspective, helping them better prevent and respond to potential data leakage threats.