Finding your way in the testing jungle: a learning approach to web security testing

Omer Tripp,Omri Weisman,Lotem Guy
DOI: https://doi.org/10.1145/2483760.2483776
2013-07-15
Abstract:Black-box security testing of web applications is a hard problem. The main complication lies in the black-box assumption: The testing tool has limited insight into the workings of server-side defenses. This has traditionally led commercial as well as research vulnerability scanners toward heuristic approaches, such as testing each input point (e.g. HTTP parameter) with a short, predefined list of effective test payloads to balance between coverage and performance. We take a fresh approach to the problem of security testing, casting it into a learning setting. In our approach, the testing algorithm has available a comprehensive database of test payloads, such that if the web application's defenses are broken, then with near certainty one of the candidate payloads is able to demonstrate the vulnerability. The question then becomes how to efficiently search through the payload space to find a good candidate. In our solution, the learning algorithm infers from a failed test---by analyzing the website's response---which other payloads are also likely to fail, thereby pruning substantial portions of the search space. We have realized our approach in XSS Analyzer, an industry-level cross-site scripting (XSS) scanner featuring 500,000,000 test payloads. Our evaluation on 15,552 benchmarks shows solid results: XSS Analyzer achieves >99% coverage relative to brute-force traversal over all payloads, while trying only 10 payloads on average per input point. XSS Analyzer also outperforms several competing algorithms, including a mature commercial algorithm---featured in IBM Security AppScan Standard V8.5---by a far margin. XSS Analyzer has recently been integrated into the latest version of AppScan (V8.6) instead of that algorithm.
What problem does this paper attempt to address?