Time series processing for wrapper maintenance

Yingju Xia,Shu Zhang,Yuhang Yang,Hao Yu.
2011-01-01
Abstract:Web pages are increasingly dynamically generated using a common template populated with data from databases. Data is typically extracted from web sources by the programs called wrappers. The wrapper maintenance arises because web source may experiment changes that invalidate the current wrappers. This paper presents a wrapper maintenance method that uses a tree alignment method and time series processing approach to detect the change points on the webpage series. The tree alignment method is utilized to get the similarity between wrapper and web pages and to build the similarity series. A log likelihood ratio test is adopted for detecting the change points on the similarity series. The wrapper generation method is applied to generate a wrapper once the web source change is detected. Experimental results show that the method achieves high accuracy and has steady performance. © 2011 IADIS.
What problem does this paper attempt to address?