Data-Driven Solution Portfolios

Marina Drygala,Silvio Lattanzi,Andreas Maggiori,Miltiadis Stouras,Ola Svensson,Sergei Vassilvitskii
2024-12-01
Abstract:In this paper, we consider a new problem of portfolio optimization using stochastic information. In a setting where there is some uncertainty, we ask how to best select $k$ potential solutions, with the goal of optimizing the value of the best solution. More formally, given a combinatorial problem $\Pi$, a set of value functions $V$ over the solutions of $\Pi$, and a distribution $D$ over $V$, our goal is to select $k$ solutions of $\Pi$ that maximize or minimize the expected value of the {\em best} of those solutions. For a simple example, consider the classic knapsack problem: given a universe of elements each with unit weight and a positive value, the task is to select $r$ elements maximizing the total value. Now suppose that each element's weight comes from a (known) distribution. How should we select $k$ different solutions so that one of them is likely to yield a high value? In this work, we tackle this basic problem, and generalize it to the setting where the underlying set system forms a matroid. On the technical side, it is clear that the candidate solutions we select must be diverse and anti-correlated; however, it is not clear how to do so efficiently. Our main result is a polynomial-time algorithm that constructs a portfolio within a constant factor of the optimal.
Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to optimally select a set of potential solutions (i.e., construct a solution portfolio) in the presence of uncertainty to optimize the value of the best solution among these solutions. Specifically, given a combinatorial optimization problem \(\Pi\), a set of value functions \(V\), and a distribution \(D\) on \(V\), the goal is to select \(k\) solutions from the solution set of \(\Pi\) to maximize or minimize the expected value of the best solution among these \(k\) solutions. Described in more formal language, the problem can be stated as: - Given a combinatorial optimization problem \(\Pi\), - A set of value functions \(V\), each function maps the solutions of \(\Pi\) to a numerical value, - A probability distribution \(D\) on \(V\), The goal is to select \(k\) solutions such that the expected value of the best solution among these \(k\) solutions is maximized or minimized. ### Specific Example The paper gives an example of the classic knapsack problem: Given a set of elements, each element has a unit weight and a positive value, and the task is to select \(r\) elements from these elements to maximize the total value. Now assume that the weight of each element comes from a known distribution. In this case, how to select \(k\) different solutions so that one of the solutions is likely to have a high value? ### Technical Challenges From a technical perspective, the selected candidate solutions must be diverse and as uncorrelated (anti - correlated) as possible with each other. However, it is not obvious how to achieve this efficiently. The main result of the paper is a polynomial - time algorithm, and the portfolio constructed by this algorithm is close to the optimal solution within a constant factor. ### Application Scenarios This problem is applicable not only to accelerating algorithms but also to other fields. For example, in urban traffic, when finding the shortest path between two points, considering different traffic conditions every day, several possible paths can be pre - calculated to quickly evaluate the best path under new traffic conditions without recalculating every time. ### Summary The core problem of the paper is to construct a solution portfolio by a data - driven method using historical data to optimize the expected value of the best solution in the case of uncertainty. This problem is of great significance in multiple practical application scenarios and presents new theoretical and technical challenges.