Xml Data Compression In Web Publishing

Ruiheng Qiu,Wei Hu,Zhi Tang,Xiaoqing Lu,Lei Zhang
DOI: https://doi.org/10.1117/12.905400
2012-01-01
Abstract:XML is widely used in various document formats on the web. But it has caused negative impacts such as expensive document distribution time over the web, and long content jumping and rendering delay, especially on mobile devices. Hence we proposed a Schema-based efficient queryable XML compressor, called XTrim, which significantly improves compression ratio by utilizing optimized information in XML Schema while supporting efficient queries. Firstly, XTrim draws structure information from XML document and corresponding XML Schema. Then a novel technique is used to transform the XML tree-like structure into a compact indexed form to support efficient queries. At the same time, text values are obtained, and a language-based text trim method (LTT) that facilitates language-specific text compressors is adopted to reduce the size of text values in various languages. In LTT a word composition detection method is proposed to better process text in non-Latin languages. To evaluate the performance of XTrim, we have implemented a compressor and query engine prototype. Via extensive experiments, results show that XTrim outperforms XMill and existing queryable alternatives in terms of compression ratio, as well as the query efficiency. By applying XTrim to documents, the storage space can save up to 30% and the content jumping and rendering delay is reduced to less than 100ms from 4 seconds.
What problem does this paper attempt to address?