Lazy object copy as a platform for population-based probabilistic programming

Lawrence M. Murray
DOI: https://doi.org/10.48550/arXiv.2001.05293
2020-01-09
Abstract:This work considers dynamic memory management for population-based probabilistic programs, such as those using particle methods for inference. Such programs exhibit a pattern of allocating, copying, potentially mutating, and deallocating collections of similar objects through successive generations. These objects may assemble data structures such as stacks, queues, lists, ragged arrays, and trees, which may be of random, and possibly unbounded, size. For the simple case of $N$ particles, $T$ generations, $D$ objects, and resampling at each generation, dense representation requires $O(DNT)$ memory, while sparse representation requires only $O(DT+DN\log DN)$ memory, based on existing theoretical results. This work describes an object copy-on-write platform to automate this saving for the programmer. The core idea is formalized using labeled directed multigraphs, where vertices represent objects, edges the pointers between them, and labels the necessary bookkeeping. A specific labeling scheme is proposed for high performance under the motivating pattern. The platform is implemented for the Birch probabilistic programming language, using smart pointers, hash tables, and reference-counting garbage collection. It is tested empirically on a number of realistic probabilistic programs, and shown to significantly reduce memory use and execution time in a manner consistent with theoretical expectations. This enables copy-on-write for the imperative programmer, lazy deep copies for the object-oriented programmer, and in-place write optimizations for the functional programmer.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?