Abstract:Agile analytics can help organizations to gain and sustain a competitive advantage by making timely decisions. Approximate query processing (AQP) is one of the useful approaches in agile analytics, which facilitates fast queries on big data by leveraging a pre-computed sample. One problem such a sample faces is that when new data is being imported, re-sampling is most likely needed to keep the sample fresh and AQP results accurate enough. Re-sampling from scratch for every batch of new data, called the full re-sampling method and adopted by many existing AQP works, is obviously a very costly process, and a much quicker incremental sampling process, such as reservoir sampling, may be used to cover the newly arrived data. However, incremental update methods suffer from the fact that the sample size cannot be increased, which is a problem when the underlying data distribution dramatically changes and the sample needs to be enlarged to maintain the AQP accuracy. This paper proposes an adaptive sample update (ASU) approach that avoids re-sampling from scratch as much as possible by monitoring the data distribution, and uses instead an incremental update method before a re-sampling becomes necessary. The paper also proposes an enhanced approach (T-ASU), which tries to enlarge the sample size without re-sampling from scratch when a bit of query inaccuracy is tolerable to further reduce the sample update cost. These two approaches are integrated into a state-of-the-art AQP engine for an extensive experimental study. Experimental results on both real-world and synthetic datasets show that the two approaches are faster than the full re-sampling method while achieving almost the same AQP accuracy when the underlying data distribution continuously changes.

Learning-based Sample Tuning for Approximate Query Processing in Interactive Data Exploration

Learned Optimizer for Online Approximate Query Processing in Data Exploration

Learning Approximation Sets for Exploratory Queries

AQP++: Connecting Approximate Query Processing with Aggregate Precomputation for Interactive Analytics

QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning.

Learning-Based Optimization for Online Approximate Query Processing

An Agile Sample Maintenance Approach for Agile Analytics

LAQP: Learning-based Approximate Query Processing

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

Enhancing Online Index Tuning with a Learned Tuning Diagnostic.

SAQP++: Bridging the Gap Between Sampling-Based Approximate Query Processing and Aggregate Precomputation.

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

Model-based Approximate Query Processing

ShadowAQP: Efficient Approximate Group-by and Join Query Via Attribute-oriented Sample Size Allocation and Data Generation.

POLYTOPE: a flexible sampling system for answering exploratory queries

Learning to Optimize Join Queries With Deep Reinforcement Learning

DeepSampling: Selectivity Estimation with Predicted Error and Response Time

When Quantum Computing Meets Database: A Hybrid Sampling Framework for Approximate Query Processing

MISS: finding optimal sample sizes for approximate analytics

Optimized stratified sampling for approximate query processing

AQUA+: Query Optimization for Hybrid Database-MapReduce System.