Abstract:We propose a new Web page transformation method to facilitate Web browsing on handheld devices such as Personal Digital Assistants (PDAs). In our approach, an original Web page that does not fit on the screen is transformed into a set of subpages, each of which fits on the screen. This transformation is done through slicing the original page into page blocks iteratively, with several factors considered. These factors include the size of the screen, the size of each page block, the number of blocks in each transformed page, the depth of the tree hierarchy that the transformed pages form, as well as the semantic coherence between blocks. We call the tree hierarchy of the transformed pages an SP-tree. In an SP-tree, an internal node consists of a textually enhanced thumbnail image with hyperlinks, and a leaf node is a block extracted from a subpage of the original Web page. We adaptively adjust the fanout and the height of the SP-tree so that each thumbnail image is clear enough for users to read, while at the same time, the number of clicks needed to reach a leaf page is few. Through this transformation algorithm, we preserve the contextual information in the original Web page and reduce scrolling. We have implemented this transformation module on a proxy server and have conducted usability studies on its performance. Our system achieved a shorter task completion time compared with that of transformations from the Opera browser in nine of ten tasks. The average improvement on familiar pages was 44&percnt;. The average improvement on unfamiliar pages was 37&percnt;. Subjective responses were positive.

Detecting and Monitoring Dynamic Content Blocks of a Web Page by Merging its Historical Versions ∗

Web Information Segmentation Method Based on DOM Structure Tree

Using XPath to Discover Informative Content Blocks of Web Pages

Dynamic mining for web navigation patterns based on markov model

Dynamic web page segmentation method

Identify Temporal Websites Based on User Behavior Analysis.

A cognitive crawler using structure pattern for incremental crawling and content extraction

An Efficient Valid Page Crawling Approach for Websites with Dynamic Scripts

Navigation Objects Extraction for Better Content Structure Understanding

The Technology of Extracting Content Information from Web Page Based on DOM Tree

Dynamic Semantic Clustering Approach For Web User Interest

Providing Adaptive Dynamic Web Content in Mobile Environment

Visual Based Content Understanding Towards Web Adaptation.

Browsing on small displays by transforming Web pages into hierarchically structured subpages

Constructing Novel Block Layouts for Webpage Analysis

HIDDEN WEBPAGE INFORMATION EXTRACTION ALGORITHM USING DOM STATE TRANSFER

Extracting Content Structure For Web Pages Based On Visual Representation

Content Extraction of Web Pages Based on Characteristic Symbols

Building an Adaptive Site Map Based on Domain and Usage Information

Detecting Malicious Websites in Depth through Analyzing Topics and Web-pages

A hybrid approach for content extraction with text density and visual importance of DOM nodes