What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to effectively visualize and interact with large - scale behavioral data so that human experts can perceive, compare and understand these data more easily. Specifically, the paper proposes an algorithm named split - diffuse (SD), aiming to evenly distribute high - dimensional data points into two - dimensional or three - dimensional visualization spaces, thereby improving the distinguishability and interactivity between data points. ### Specific Background of the Problem 1. **Visualization Challenges of High - Dimensional Data**: - When data has multiple measurement dimensions, each sample is represented in a high - dimensional space \( H \). For example, data collected by network sensors, quantitative indicators in the stock market, word - frequency vectors of documents, etc. - High - dimensional data is difficult to be directly visualized, so dimension - reduction techniques (such as PCA, MDS, t - SNE, etc.) need to be used to map it to a low - dimensional space \( L \) (usually 2D or 3D). However, existing dimension - reduction methods may lead to an uneven distribution of data points in the visualization space, affecting the relative relationships and readability between data points. 2. **Limitations of Existing Dimension - Reduction Methods**: - Data points may overlap in the dimension - reduced visualization space, making the information difficult to identify. - Data points are too dense in some areas, increasing the difficulty of interacting with the data. - When comparing behaviors in different time periods or of different targets, geometric relationships may mask the actual differences. ### The Method Proposed in the Paper To overcome the above problems, the paper proposes the split - diffuse (SD) algorithm. The main objectives of this algorithm are: - **Evenly Distribute Data Points**: Distribute data points evenly in the visualization space through recursive splitting and diffusion. - **Maintain Topological Relationships between Points**: Try to maintain the relative positional relationships between data points during the dimension - reduction process. - **Improve Interactivity**: Through evenly distributed data points, users can interact and compare more conveniently. ### Application Scenarios The paper shows the application of the SD algorithm in the following application scenarios: - **Network Security Field**: Analyze network activity logs to detect abnormal behaviors and provide visual risk assessments. - **Other Fields**: Such as e - commerce log analysis of customers' shopping behaviors and preferences, credit card transaction analysis, customer complaint analysis, etc. ### Summary The core problem of the paper is: how to improve the interpretability and interactivity of large - scale behavioral data analysis by improving the distribution of data points in the visualization space. The SD algorithm provides an effective solution for this, enabling human experts to understand and compare complex behavioral data more intuitively.

Large Scale Behavioral Analytics via Topical Interaction

Interacting with Massive Behavioral Data

Online Visual Analytics of Text Streams

Parallel Visualization for Large-Scale Datasets

Optimizing temporal topic segmentation for intelligent text visualization.

A Novel Visual Analytics Approach for Clustering Large-Scale Social Data

Visual Analytics of Taxi Trajectory Data Via Topical Sub-trajectories.

Visual Analytics of the Spatio-temporal Multidimensional Air Monitoring Data

Closed-loop Big Data Analysis with Visualization and Scalable Computing

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

Visualizing Large-Scale Spatial Time Series with GeoChron.

An Interactive Visual Analytics System for Incremental Classification Based on Semi-supervised Topic Modeling

Visual Analysis of Group Behavior Based on Origin-Destination Data

Scalable Multi-variate Analytics of Seismic and Satellite-based Observational Data

Topic-based Visual Text Summarization and Analysis 1

Real-Time Visual Analysis of High-Volume Social Media Posts

A Hierarchical Aggregation Framework for Efficient Multilevel Visual Exploration and Analysis

Visual Analysis of Large Multivariate Scattered Data using Clustering and Probabilistic Summaries

Visual Abstraction and Exploration of Multi-class Scatterplots

VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling

Level Set Restricted Voronoi Tessellation for Large scale Spatial Statistical Analysis