DataSynth: generating synthetic data using declarative constraints

Arvind Arasu,Raghav Kaushik,Jian Li
DOI: https://doi.org/10.14778/3402755.3402785
2011-01-01
Abstract:AbstractA variety of scenarios such as database system and application testing, data masking, and benchmarking require synthetic database instances, often having complex data characteristics. We present DataSynth, a flexible tool for generating synthetic databases. DataSynth uses a simple and powerful declarative abstraction based on cardinality constraints to specify data characteristics, and uses sophisticated algorithms to efficiently generate database instances satisfying the specified characteristics. The demo will showcase various features of DataSynth using two real-world data generation scenarios.
What problem does this paper attempt to address?