Abstract:Automated visualization recommendations (vis-rec) help users to derive crucial insights from new datasets. Typically, such automated vis-rec models first calculate a large number of statistics from the datasets and then use machine-learning models to score or classify multiple visualizations choices to recommend the most effective ones, as per the statistics. However, state-of-the art models rely on very large number of expensive statistics and therefore using such models on large datasets become infeasible due to prohibitively large computational time, limiting the effectiveness of such techniques to most real world complex and large datasets. In this paper, we propose a novel reinforcement-learning (RL) based framework that takes a given vis-rec model and a time-budget from the user and identifies the best set of input statistics that would be most effective while generating the visual insights within a given time budget, using the given model. Using two state-of-the-art vis-rec models applied on three large real-world datasets, we show the effectiveness of our technique in significantly reducing time-to visualize with very small amount of introduced error. Our approach is about 10X times faster compared to the baseline approaches that introduce similar amounts of error.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the scalability issue of existing Visualization - Recommendation (Vis - Rec) models when dealing with large - scale datasets. Specifically, in order to be able to generalize on unknown datasets, these models calculate a large number of statistical features, which becomes infeasible when facing large - scale datasets because the calculation time is too long, resulting in poor performance of these techniques in practical applications. ### Summary of Main Problems: 1. **High Computational Complexity**: Existing Vis - Rec models rely on a large number of statistical features (for example, Qian et al. used 1,006 high - order statistical features per column in [13]), which makes the calculation on large - scale datasets very expensive and time - consuming. 2. **Not Applicable to Large - Scale Datasets**: As the scale of the dataset increases, the time to calculate these statistical features increases sharply, making these models unable to work effectively in practical applications. 3. **Lack of Flexibility**: Traditional solutions (such as random selection or sampling) may lead to a decline in the quality of generated visualization recommendations or fail to accurately reflect the characteristics of the entire dataset. ### Solutions Proposed in the Paper: To solve the above problems, the paper proposes a new framework named ScaleViz. ScaleViz optimizes the performance of Vis - Rec models on large - scale datasets through the following steps: 1. **Cost Analysis**: Use a small number of data samples of different scales to estimate the calculation cost of each statistical feature. 2. **Budget - Aware Reinforcement Learning**: Utilize Reinforcement Learning (RL) techniques to gradually learn and select the most effective statistical features within a given time budget. 3. **Feature Selection and Inference**: Based on the learned knowledge, only calculate the selected statistical features and generate visualization recommendations. Through this method, ScaleViz can significantly reduce the calculation time while ensuring the recommendation quality, achieving a speed - up of up to 10 times. ### Formula Representation: The formulas involved in the paper are mainly used to describe the optimization problem and the reinforcement learning process. For example, the optimization problem can be formalized as: \[ \min_{\theta} L[P(\theta(f) \odot f) - P(f)], \quad \text{subject to:} \sum_{i,j} \theta(f) \odot c(f) \leq B \] where: - \(L\) is the loss function, which is used to compare the differences in model outputs under different feature sets. - \(c(f)\) is the cost function for calculating features. - \(B\) is the time budget specified by the user. In this way, ScaleViz can efficiently generate high - quality visualization recommendations on large - scale datasets.

ScaleViz: Scaling Visualization Recommendation Models on Large Data

RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

ML-based Visualization Recommendation: Learning to Recommend Visualizations from Data

VizRec: A framework for secure data exploration via visual representation

VizML: A Machine Learning Approach to Visualization Recommendation

V-RECS, a Low-Cost LLM4VIS Recommender with Explanations, Captioning and Suggestions

Visualization Assessment - A Machine Learning Approach.

VizGRank: A Context-Aware Visualization Recommendation Method Based on Inherent Relations Between Visualizations.

Agnostic Visual Recommendation Systems: Open Challenges and Future Directions

BlinkViz: Fast and Scalable Approximate Visualization on Very Large Datasets Using Neural-Enhanced Mixed Sum-Product Networks

Understanding Scaling Laws for Recommendation Models

AdaVis: Adaptive and Explainable Visualization Recommendation for Tabular Data

Scaling New Frontiers: Insights into Large Recommendation Models

Scaling Law of Large Sequential Recommendation Models

Towards model-free RL algorithms that scale well with unstructured data

When Do We Not Need Larger Vision Models?

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Visual Data Analysis with Task-Based Recommendations

Personalized Visualization Recommendation