Jupyter Scatter: Interactive Exploration of Large-Scale Datasets

Fritz Lekschas,Trevor Manz
2024-06-21
Abstract:Jupyter Scatter is a scalable, interactive, and interlinked scatterplot widget for exploring datasets in Jupyter Notebook/Lab, Colab, and VS Code. Its goal is to simplify the visual exploration, analysis, and comparison of large-scale bivariate datasets. Jupyter Scatter can render up to twenty million points, supports fast point selections, integrates with Pandas DataFrame and Matplotlib, uses perceptually-effective default settings, and offers a user-friendly API.
Human-Computer Interaction
What problem does this paper attempt to address?
The paper introduces a tool called Jupyter Scatter, which aims to address the issue of efficient and interactive exploration and analysis of large-scale bivariate datasets in environments such as Jupyter Notebook, JupyterLab, Google Colab, and VS Code. Specifically, the goals of Jupyter Scatter include: 1. **Simplifying visual exploration and analysis**: By providing a scalable, interactive scatterplot widget, it allows users to intuitively explore large-scale datasets. 2. **Supporting large datasets**: It can render up to 20 million data points and has the capability for fast point selection, which is particularly useful for handling large-scale datasets. 3. **Optimizing default settings**: It employs perceptually effective default settings, such as color and transparency, to enhance visualization effects. 4. **Providing a user-friendly API**: It enables users to easily configure and customize various attributes of the scatterplot. 5. **Implementing multi-plot synchronization**: It allows users to create multiple interconnected scatterplots, where views and selections can be synchronized. This helps in comparing different datasets or different subsets of the same dataset. In summary, Jupyter Scatter primarily addresses the inefficiencies, lack of interactivity, and difficulties in effective comparison that data scientists face when exploring large-scale bivariate data.