Web Data Extraction using Hybrid Program Synthesis: A Combination of Top-down and Bottom-up Inference

Mohammad Raza,Sumit Gulwani
DOI: https://doi.org/10.1145/3318464.3380608
2020-06-11
Abstract:Automatic synthesis of web data extraction programs has been explored in a variety of settings, but in practice there remain various robustness and usability challenges. In this work we present a novel program synthesis approach which combines the benefits of deductive and enumerative synthesis strategies, yielding a semi-supervised technique with which concise programs expressible in standard languages can be synthesized from very few examples. We demonstrate improvement over existing techniques in terms of overall accuracy, number of examples required, and program complexity. Our method has been deployed as a web extraction feature in the mass market Microsoft Power BI product.
What problem does this paper attempt to address?