Linking Archives Using Document Enrichment and Term Selection

Marc Bron,Bouke Huurnink,Maarten de Rijke
DOI: https://doi.org/10.1007/978-3-642-24469-8_37
2011-01-01
Abstract:News, multimedia and cultural heritage archives are increasingly offering opportunities to create connections between their collections. We consider the task of linking archives: connecting an item in one archive to one or more items in other, often complementary archives. We focus on a specific instance of the task: linking items with a rich textual representation in a news archive to items with sparse annotations in a multimedia archive, where items should be linked if they describe the same or a related event. We find that the difference in textual richness of annotations presents a challenge and investigate two approaches: (i) to enrich sparsely annotated items with textually rich content; and (ii) to reduce rich news archive items using term selection. We demonstrate the positive impact of both approaches on linking to same events and linking to related events.
What problem does this paper attempt to address?