Extracting URLs from JavaScript via program analysis.

Qi Wang,Jingyu Zhou,Yuting Chen,Yizhou Zhang,Jianjun Zhao
DOI: https://doi.org/10.1145/2491411.2494583
2013-01-01
Abstract:ABSTRACT With the extensive use of client-side JavaScript in web applications, web contents are becoming more dynamic than ever before. This poses significant challenges for search engines, because more web URLs are now embedded or hidden inside JavaScript code and most web crawlers are script-agnostic, significantly reducing the coverage of search engines. We present a hybrid approach that combines static analysis with dynamic execution, overcoming the weakness of a purely static or dynamic approach that either lacks accuracy or suffers from huge execution cost. We also propose to integrate program analysis techniques such as statement coverage and program slicing to improve the performance of URL mining.
What problem does this paper attempt to address?