Visualizing Document Similarity

Nan Cao,Weiwei Cui
DOI: https://doi.org/10.2991/978-94-6239-186-4_4
2016-01-01
Abstract:A large category of text visualization techniques were developed to illustrate similarities of document files in a corpus. This is the most traditional research direction in this field. These techniques produce visualizations in a similar form in which document files are represented as points on the display sized by their importance and colored or shaped by their semantics (such as topics). The screen distance between any pair of points indicates the similarities of the corresponding documents that are captured by different measures in different analysis models, following the rule of the closer, the more similar. Two major types of approaches: projection (i.e., dimension reduction)-based methods and semantic-oriented document visualizations, are proposed to produce such a document overview; these approaches are developed based on different data and analysis models. In this section, we investigate these techniques and the corresponding document visualization systems.
What problem does this paper attempt to address?