Adaptive Portfolio by Solving Multi-armed Bandit Via Thompson Sampling

Mengying Zhu,Xiaolin Zheng,Yan Wang,Yuyuan Li,Qianqiao Liang
2019-01-01
Abstract:As the cornerstone of modern portfolio theory, Markowitz's mean-varianceoptimization is considered a major model adopted in portfolio management.However, due to the difficulty of estimating its parameters, it cannot beapplied to all periods. In some cases, naive strategies such asEqually-weighted and Value-weighted portfolios can even get better performance.Under these circumstances, we can use multiple classic strategies as multiplestrategic arms in multi-armed bandit to naturally establish a connection withthe portfolio selection problem. This can also help to maximize the rewards inthe bandit algorithm by the trade-off between exploration and exploitation. Inthis paper, we present a portfolio bandit strategy through Thompson samplingwhich aims to make online portfolio choices by effectively exploiting theperformances among multiple arms. Also, by constructing multiple strategicarms, we can obtain the optimal investment portfolio to adapt differentinvestment periods. Moreover, we devise a novel reward function based on users'different investment risk preferences, which can be adaptive to variousinvestment styles. Our experimental results demonstrate that our proposedportfolio strategy has marked superiority across representative real-worldmarket datasets in terms of extensive evaluation criteria.
What problem does this paper attempt to address?