Efficient Filtering of RSS Documents on Computer Cluster

Haifeng Liu,Milenko Petrovic,Hans-Arno Jacobsen
2008-01-01
Abstract:RSS ltering is very important today with the increasing amount of information on the Web. There are many tools to aggregrate and manipulate content from around the web based on the RSS format. Today clusters are the infras- tructure of choice for many large Internet service provider. In this paper we develop algorithms to enable ecien t l- tering of RSS documents, which is in a graph structured data format, on a computing cluster. We propose indexing and ltering algorithms and suggest several optimizations. The results indicate that the system throughput increases to 400% on a cluster infrastructure over a non-clustered, centralized implementation. In general, we observe that the ltering performance of our algorithms scales linearly in the number of compute nodes in the cluster.
What problem does this paper attempt to address?