Conditional Regression Rules

Rui Kang,Shaoxu Song,Chaokun Wang
DOI: https://doi.org/10.1109/icde53745.2022.00231
2022-01-01
Abstract:Mixed data distribution is widely observed, for example, the bird migration data consist of the observed locations of various birds in different years, varying in data distribution. Learning a single regression model over such a mixed data distribution is often ineffective, while manually segmenting the data, e.g., by bird, date or region, for learning individual models is truly labor-intensive. In this paper, we propose to automatically discover the regression models that apply conditionally to only a part of the data, namely conditional regression rules (CRRs), enlightened by the conditional functional dependencies (CFDs) that are FDs hold only in some data. Remarkably, a regression model may apply in different parts of data, e.g., the seasonal migration of birds is similar in different years. To capture the shared regression models, we investigate the inference of CRRs. An algorithm is devised to learn and discover CRRs from data, with the help of CRR inference. Extensive experiments on real-world datasets demonstrate that the discovered conditional regression rules are more effective than the regression models without conditions. In particular, with the inference of CRRs, the number of learned CRRs is significantly reduced without sacrificing rule semantics.
What problem does this paper attempt to address?