From Batch to Stream: Automatic Generation of Online Algorithms

Ziteng Wang,Shankara Pailoor,Aaryan Prakash,Yuepeng Wang,Isil Dillig
DOI: https://doi.org/10.1145/3656418
2024-05-09
Abstract:Online streaming algorithms, tailored for continuous data processing, offer substantial benefits but are often more intricate to design than their offline counterparts. This paper introduces a novel approach for automatically synthesizing online streaming algorithms from their offline versions. In particular, we propose a novel methodology, based on the notion of relational function signature (RFS), for deriving an online algorithm given its offline version. Then, we propose a concrete synthesis algorithm that is an instantiation of the proposed methodology. Our algorithm uses the RFS to decompose the synthesis problem into a set of independent subtasks and uses a combination of symbolic reasoning and search to solve each subproblem. We implement the proposed technique in a new tool called Opera and evaluate it on over 50 tasks spanning two domains: statistical computations and online auctions. Our results show that Opera can automatically derive the online version of the original algorithm for 98% of the tasks. Our experiments also demonstrate that Opera significantly outperforms alternative approaches, including adaptations of SyGuS solvers to this problem as well as two of Opera's own ablations.
Programming Languages
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the problem of automatically converting batch - processing algorithms (offline algorithms) into stream - processing algorithms (online algorithms). Specifically, the paper proposes a novel method that can automatically generate the corresponding online algorithms from offline algorithms. #### Problem background Online streaming algorithms are designed specifically for continuous data processing. Compared with offline algorithms, they can receive and process new data points one by one when processing data, without the need to access the entire data set at once. However, designing online algorithms is usually much more complicated than designing offline algorithms. For example, the offline algorithm for calculating variance is relatively simple, while its corresponding online algorithm (such as the Welford algorithm) is much more complex. #### Research objectives The main objective of the paper is to develop a fully - automatic method that can automatically generate the corresponding online algorithms from a given offline algorithm. Specific contributions include: 1. **Proposing a new synthesis method based on Relational Function Signature (RFS)**: This method formalizes the relationship between offline and online algorithms through RFS and ensures that the generated online algorithm is semantically equivalent to the original offline algorithm. 2. **Implementing a specific synthesis algorithm**: This algorithm uses RFS to decompose the synthesis problem into multiple independent subtasks and combines symbolic reasoning and search techniques to solve each subtask. 3. **Developing a tool named Opera**: This tool implements the above - mentioned method and has been evaluated in more than 50 tasks, covering two fields: statistical calculation and online auctions. The experimental results show that Opera can successfully convert offline algorithms into online algorithms in 98% of the tasks and is significantly better than other baseline methods. #### Technical details - **RFS inference**: Infer the auxiliary parameters required by the online program and their corresponding expressions through static analysis of the offline program. - **Initializer construction**: Build initial values for the online scheme according to RFS. - **Program sketch generation**: Generate a sketch containing unknown parts from the offline program, and these unknown parts need further synthesis. - **Expression synthesis**: Find expressions that meet the specifications for each unknown part in the sketch, and complete this process by combining symbolic reasoning and search techniques. Through this method, the paper solves the problem of efficiently and automatically converting batch - processing algorithms into stream - processing algorithms, thus making online data processing more convenient and efficient.