A Discrete Choice Model for Subset Selection

Austin R. Benson,Ravi Kumar,Andrew Tomkins
DOI: https://doi.org/10.1145/3159652.3159702
2018-02-02
Abstract:Multinomial logistic regression is a classical technique for modeling how individuals choose an item from a finite set of alternatives. This methodology is a workhorse in both discrete choice theory and machine learning. However, it is unclear how to generalize multinomial logistic regression to subset selection, allowing the choice of more than one item at a time. We present a new model for subset selection derived from the perspective of random utility maximization in discrete choice theory. In our model, the quality of a subset is determined by the quality of its elements, plus an optional correction. Given a budget on the number of subsets that may receive correction, we develop a framework for learning the quality scores for each item, the choice of subsets, and the correction for each subset. We show that, given the subsets to receive correction, we can efficiently and optimally learn the remaining model parameters jointly. We show further that learning the optimal subsets is both NP-hard and non-submodular, but there are efficient heuristics that perform well in practice. We combine these pieces to provide an overall learning solution and apply it to subset prediction tasks. We find that with reasonably-sized budgets, there are significant gains in average per-choice likelihood ranging from 7% to 8x depending on the dataset and also substantial improvements over a determinantal point process model.
What problem does this paper attempt to address?