Improving Portfolio Optimization Results with Bandit Networks

Gustavo de Freitas Fonseca,Lucas Coelho e Silva,Paulo André Lima de Castro
2024-10-08
Abstract:In Reinforcement Learning (RL), multi-armed Bandit (MAB) problems have found applications across diverse domains such as recommender systems, healthcare, and finance. Traditional MAB algorithms typically assume stationary reward distributions, which limits their effectiveness in real-world scenarios characterized by non-stationary dynamics. This paper addresses this limitation by introducing and evaluating novel Bandit algorithms designed for non-stationary environments. First, we present the Adaptive Discounted Thompson Sampling (ADTS) algorithm, which enhances adaptability through relaxed discounting and sliding window mechanisms to better respond to changes in reward distributions. We then extend this approach to the Portfolio Optimization problem by introducing the Combinatorial Adaptive Discounted Thompson Sampling (CADTS) algorithm, which addresses computational challenges within Combinatorial Bandits and improves dynamic asset allocation. Additionally, we propose a novel architecture called Bandit Networks, which integrates the outputs of ADTS and CADTS, thereby mitigating computational limitations in stock selection. Through extensive experiments using real financial market data, we demonstrate the potential of these algorithms and architectures in adapting to dynamic environments and optimizing decision-making processes. For instance, the proposed bandit network instances present superior performance when compared to classic portfolio optimization approaches, such as capital asset pricing model, equal weights, risk parity, and Markovitz, with the best network presenting an out-of-sample Sharpe Ratio 20% higher than the best performing classical model.
Artificial Intelligence,Portfolio Management
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the Multi - Armed Bandit (MAB) problem in non - stationary environments, especially how to improve the combinatorial optimization results to adapt to the dynamic changes in financial markets. Specifically, the paper mainly focuses on the following points: 1. **Non - stationary reward distribution**: Traditional MAB algorithms usually assume that the reward distribution is stationary, that is, it remains unchanged over time. However, in the real world, especially in financial markets, the reward distribution is often non - stationary and changes over time. This non - stationarity greatly reduces the effectiveness of traditional algorithms in practical applications. 2. **Portfolio optimization**: In the financial field, portfolio optimization is a key issue, aiming to achieve the best balance between risk and return by selecting and allocating assets. Traditional portfolio optimization methods usually rely on static allocation strategies, which may not be able to effectively respond to changes in market conditions. Therefore, a method that can dynamically adjust asset weights is needed to better adapt to market dynamics. 3. **Computational challenges**: In combinatorial optimization problems, especially when multiple assets are involved, computational complexity is an important challenge. How to efficiently handle these problems and find the optimal solution is an urgent problem to be solved. To solve the above problems, the paper proposes the following innovative methods: - **Adaptive Discounted Thompson Sampling (ADTS)**: By introducing the relaxation discount and smoothing window mechanism, the ADTS algorithm improves the adaptability to changes in non - stationary reward distributions. - **Combinatorial Adaptive Discounted Thompson Sampling (CADTS)**: For combinatorial optimization problems, the CADTS algorithm solves the computational challenges in combinatorial bandits and optimizes the decision - making process in dynamic environments. - **Bandit Networks**: This is a new architecture that integrates the outputs of ADTS and CADTS, thereby reducing the computational limitations in stock selection and improving the robustness and accuracy of the model. Through extensive experiments using real - world financial market data, the paper demonstrates the potential of these new algorithms and architectures in adapting to dynamic environments and optimizing decision - making processes. For example, the proposed Bandit Networks instance outperforms classical portfolio optimization methods such as the Capital Asset Pricing Model (CAPM), equal - weight, risk - parity, and Markowitz model, and the out - of - sample Sharpe ratio of the best - performing network instance is 20% higher than that of the best - performing classical model. In summary, this paper aims to solve the challenges in portfolio optimization in non - stationary environments by proposing new MAB algorithms and architectures, thereby improving the efficiency and accuracy of financial decision - making.