Abstract:This thesis reports a systematic study of data protection in three different, yet related, scenarios. The first scenario is to guard relational data against unauthorized redistribution or data piracy. In this scenario, the data is released to users who have full control over the use of data but are restricted from any redistribution. A novel technique is presented for embedding fingerprints in database relations so that the original recipient of the data can be identified. In our scheme, one secret key (with or without primary key attributes in database relations) is used to decide the way how a fingerprint is embedded. Rigorous analysis shows that, with high probability, a detected fingerprint is indeed the fingerprint originally embedded, and embedded fingerprints cannot be modified or erased by a variety of attacks, including bits flipping, tuple addition and deletion; secret key guessing, and collusion among multiple recipients of the same relation. In addition, as a special case of fingerprinting, a robust watermarking scheme is also presented for relational databases. The second scenario is to protect the privacy of individual data value against interval-based inference from aggregation queries. Different from the first scenario in which the data is released to users, this scenario is about data that are controlled by its owner. Users are allowed to access the data through aggregation queries. An individual data value is said to be compromised if an accurate enough interval, called inference interval, is obtained from aggregation results such that the actual data value must fall into the interval. Our study shows that it is intractable to audit interval-based inference for bounded integer values; while for bounded real values, the auditing problem has a polynomial time complexity involving mathematical programming with a large number of constraints and/or variables. The last scenario is to detect anomaly from usage log data. Different from the first scenario in which the data are released to users as well as the second scenario in which the data are allowed to be queried by users, this scenario is about using log data to build profiles for user normal behaviors and detect suspicious anomalies that deviate from the profiles. Experiments show that our proposed method is more flexible and precise than previous methods that do not use time information or simply use fixed partition of time intervals in profiling. (Abstract shortened by UMI.)

TIDF-DLPM: Term and Inverse Document Frequency based Data Leakage Prevention Model

A Learning oriented DLP System based on Classification Model

A Forecasting-Based DLP Approach for Data Security

A survey on data leakage prevention systems

Implementation of Security Systems for Detection and Prevention of Data Loss/Leakage at Organization via Traffic Inspection

Content-based data leakage detection using extended fingerprinting

Freeware Solution for Preventing Data Leakage by Insider for Windows Framework

DLPFS: The Data Leakage Prevention FileSystem

P2IDF: A Privacy-Preserving based Intrusion Detection Framework for Software Defined Internet of Things-Fog (SDIoT-Fog)

Survey of Techniques on Data Leakage Protection and Methods to address the Insider threat

On Data Leakage Prevention Maturity: Adapting the C2M2 Framework

A Reinforcement Learning Approach for Dynamic Information Flow Tracking Games for Detecting Advanced Persistent Threats

Data Inference: Data Leakage Paradigms and Defense Methods in Cyber-Physical Systems

Detection of Information leakage in cloud

Techniques for Protecting Data from Piracy, Illegal Inference, and Malicious Intrusions

SDOT: Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-Based Three-Tier Forensic Classification Framework

Dynamic IDP Signature processing by fast elimination using DFA

The VACCINE Framework for Building DLP Systems

Federated Learning for Intrusion Detection System: Concepts, Challenges and Future Directions

Deep Learning: Differential Privacy Preservation in the Era of Big Data

A Brief Study of Privacy-Preserving Practices (PPP) in Data Mining