An Information Retrieval Algorithm for Massive and Real-time Data

丁伟,林容容,倪良胜
DOI: https://doi.org/10.3321/j.issn:1000-565X.2004.z1.002
2004-01-01
Abstract:With the rapid expansion of information resources in networks, information retrieval technologies are now becoming more and more well-developed. But their current applications to massive and real-time data, especially for the conventional information retrieval algorithms, still reveal some shortcoming. Aiming at the massive and real time network data from CERNET East China North center, a two-phase vector clustering algorithm is investigated and designed, in which a high-efficiency information processing ability is implemented by a two-phase operation, clustering insertion and clustering optimization. Meanwhile, the application of the proposed algorithm in the group mail discrimination system for filtering junk mails of network data is achieved by means of the clustering tree. The retrieval efficiency is further improved.
What problem does this paper attempt to address?