Interactive optimization of embedding-based text similarity calculations

Daniel Witschard,Ilir Jusufi,Rafael M Martins,Kostiantyn Kucher,Andreas Kerren
DOI: https://doi.org/10.1177/14738716221114372
IF: 2.174
2022-08-05
Information Visualization
Abstract:Information Visualization, Ahead of Print. Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.
computer science, software engineering
What problem does this paper attempt to address?