Abstract:Many data insight questions can be viewed as searching in a large space of tables and finding important ones, where the notion of importance is defined in some adhoc user defined manner. This paper presents Holistic Cube Analysis (HoCA), a framework that augments the capabilities of relational queries for such problems. HoCA first augments the relational data model and introduces a new data type AbstractCube, defined as a function which maps a region-features pair to a relational table (a region is a tuple which specifies values of a set of dimensions). AbstractCube provides a logical form of data, and HoCA operators are cube-to-cube transformations. We describe two basic but fundamental HoCA operators, cube crawling and cube join (with many possible extensions). Cube crawling explores a region space, and outputs a cube that maps regions to signal vectors. Cube join, in turn, is critical for composition, allowing one to join information from different cubes for deeper analysis. Cube crawling introduces two novel programming features, (programmable) Region Analysis Models (RAMs) and Multi-Model Crawling. Crucially, RAM has a notion of population features, which allows one to go beyond only analyzing local features at a region, and program region-population analysis that compares region and population features, capturing a large class of importance notions. HoCA has a rich algorithmic space, such as optimizing crawling and join performance, and physical design of cubes. We have implemented and deployed HoCA at Google. Our early HoCA offering has attracted more than 30 teams building applications with it, across a diverse spectrum of fields including system monitoring, experimentation analysis, and business intelligence. For many applications, HoCA empowers novel and powerful analyses, such as instances of recurrent crawling, which are challenging to achieve otherwise.

ParaCube: A Scalable OLAP Model Based on Distributed Aggregate Computing with Sibling Cubes

Requirement-Based Data Cube Schema Design

Towards the Building of a Dense-Region-based OLAP System

A Clustered Dwarf Structure to Speed Up Queries on Data Cubes

SCANCHUNK:AN EFFICIENT ALGORITHM FOR HUNTING DENSE REGIONS IN DATA CUBE

APIC: An Efficient Algorithm for Computing Iceberg Datacubes

DROLAP - A Dense-Region Based Approach to On-Line Analytical Processing

Strategies for Complex Data Cube Queries

A Practice Of Tpc-Ds Multidimensional Implementation On Nosql Database Systems

Index-Based OLAP Aggregation for In-Memory Cluster Computing

AQP++: Connecting Approximate Query Processing with Aggregate Precomputation for Interactive Analytics

Efficient Cube Computing on an Extended Multidimensional Model over Uncertain Data.

Regression Cubes with Lossless Compression and Aggregation

Approximate Aggregations in Structured P2p Networks

A Cube Algebra with Comparative Operations: Containment, Overlap, Distance and Usability

Holistic Cube Analysis: A Query Framework for Data Insights

SAQP++: Bridging the Gap Between Sampling-Based Approximate Query Processing and Aggregate Precomputation.

Temporal Graph Cube

Progressive online aggregation in a distributed stream system

SmartCube: an Adaptive Data Management Architecture for the Real-Time Visualization of Spatiotemporal Datasets.

SparkRDF: Elastic Discreted RDF Graph Processing Engine with Distributed Memory