Measuring Data Abstraction Quality in Multiresolution Visualization ∗

Qingguang Cui,M. Ward,Elke A. Rundensteiner,Jing Yang
2006-01-01
Abstract:Data abstraction techniques such as filtering, sampling, clustering, and summarizing can be used to reduce the size of a large dataset while maintaining the dominant characteristics of the original data. They are widely used in multiresolution visualization systems to reduce visual clutter and facilitate analysis from overview to detail. However, analysts are usually unaware of how well the abstracted data represent the original dataset, which can impact the reliability of results gleaned from the abstractions. In this paper, we define two data abstraction quality measures for computing the degree to which the abstraction conveys the original dataset: the Histogram Difference Measure and the Nearest Neighbor Measure. Each is inspired by information and abstraction measures that have been successfully used in other disciplines, including pattern recognition, image retrieval, image compression and approximate query processing. These measures have been integrated within XmdvTool, a public-domain multiresolution visualization system for multivariate data analysis that supports sampling as well as clustering to simplify data. Each abstraction quality measure is computed based on the data abstraction being displayed and presented to the analysts. These measures can be used to indicate the confidence level of the discovered patterns. Thus analysts can make more accurate decisions. Several interactive operations are provided, including adjusting the data abstraction level, changing selected regions, and setting the acceptable data abstraction quality level. Conducting these operations, analysts can select an optimal data abstraction level, trading off between the data density on the screen and data abstraction quality. Also, analysts can compare different abstraction methods using the measures to see how well relative data density and outliers are maintained, and then select an abstraction method that meets the requirement of their analytic tasks. CR Categories: H.5.2 [Information Interfaces and Presentation]: User Interfaces—Graphical user interfaces I.5.3 [Pattern Recogn ition]: Clustering—Similarity Measures
What problem does this paper attempt to address?