What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently compress and approximate geometric data in the context of big data, so that complex algorithms can be run quickly and the accuracy of the results can be guaranteed. Specifically, the paper focuses on two data compression methods, **Coresets** and **Sketches**. ### Core Problems 1. **Coresets**: Coresets are a reduced data set that can act as a proxy for the complete data set. By running the same algorithm on Coresets, similar results can be obtained as running on the complete data set. The paper explores different types of Coresets, such as Coresets in shape fitting, density estimation, high - dimensional vectors, high - dimensional point sets/matrices, and clustering problems. 2. **Sketches**: Sketches map the complete data set onto an easily updatable data structure, so that the results of certain queries can be approximated to the query results on the complete data set. The paper discusses linear Sketches, where the mapping is a linear function of each data point, facilitating the addition, deletion, or modification of data. ### Specific Objectives - **Shape Fitting**: Find the shape that best fits a given point set, such as the minimum enclosing sphere and ε - core Coresets. - **Density Estimation**: Select a subset from the discrete density function so that it is similar to the density function of the original data set under a specific metric. - **High - Dimensional Vectors**: Approximate the frequency count and frequency moment of high - dimensional vectors. - **High - Dimensional Point Sets/Matrices**: Perform low - rank approximation on high - dimensional point sets or matrices, especially in the application of streaming data processing and distributed computing environments. - **Clustering**: Use Coresets and Sketches in clustering problems to reduce computational complexity. ### Technical Challenges - **Space Efficiency**: How to store Coresets and Sketches within a limited space, especially in streaming data processing and distributed computing environments. - **Time Efficiency**: How to construct and update Coresets and Sketches within a limited time, especially on large - scale data sets. - **Error Control**: How to ensure that the error between the approximate results of Coresets and Sketches and the results on the complete data set is within an acceptable range. ### Methodology - **Random Sampling**: Construct Coresets and Sketches through random sampling, and use theories such as VC dimension to ensure the accuracy of approximation. - **Merge - Reduce Framework**: In streaming data processing and distributed computing, efficiently construct Coresets and Sketches through merge and reduce operations. - **Linear Projection**: Use the Johnson - Lindenstrauss lemma to reduce the dimension of data through random projection while maintaining the structural characteristics of the data. ### Application Scenarios - **Machine Learning**: Train models on large - scale data sets and reduce the consumption of computing resources. - **Data Mining**: Quickly discover patterns and trends in large data sets. - **Graphics Processing**: Perform shape fitting and clustering analysis in high - dimensional data sets. In general, this paper aims to provide an efficient and accurate data compression method through Coresets and Sketches techniques to meet the computational challenges in the context of big data.

Coresets and Sketches

Introduction to Core-sets: an Updated Survey

Coresets for Clustering in Geometric Intersection Graphs

Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal

Coresets for Kinematic Data: From Theorems to Real-Time Systems

Coreset Construction and Estimation over Stochastic Data

On Coresets for Clustering in Small Dimensional Euclidean Spaces

Coresets for Clustering in Graphs of Bounded Treewidth

SketchGraphs: A Large-Scale Dataset for Modeling Relational Geometry in Computer-Aided Design

Statistical properties of sketching algorithms

Geometric Understanding of Sketches

Representations, Metrics and Statistics For Shape Analysis of Elastic Graphs

Small coresets via negative dependence: DPPs, linear statistics, and concentration

Coresets for Time Series Clustering

Coresets for Clustering with Missing Values

Experimental Evaluation of Fully Dynamic k-Means via Coresets

Digital Geometry, a Survey

Coresets for Constrained Clustering: General Assignment Constraints and Improved Size Bounds

Pyramid Sketch

Coresets for kernel clustering