Abstract:Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model's superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to optimize the construction and management of financial portfolios by combining Markowitz Portfolio Theory and Reinforcement Learning (RL), especially the Deep Deterministic Policy Gradient (DDPG) algorithm. Specifically, the paper aims to develop a new hybrid method - Knowledge - Distilled Reinforcement Learning (KDD) - to improve the performance of portfolios in the actual market, achieving higher yields and lower risks. ### Core problems of the paper: 1. **Balancing returns and risks**: The core problem of financial portfolios is to find an optimal allocation that can both maximize expected returns and minimize risks. Although the traditional Markowitz model provides a theoretically optimal solution, it has limitations in practical applications, especially when dealing with complex and changeable market environments. 2. **Challenges in the application of reinforcement learning**: Although reinforcement learning has achieved remarkable success in many fields, when it is applied to financial portfolio management, it faces many challenges, such as continuous action spaces, large data noise, and differences between historical data and the real - market. 3. **Integrating traditional and modern technologies**: The paper attempts to combine the classic Markowitz portfolio theory with modern deep - learning and reinforcement - learning technologies. Through the method of knowledge distillation, the reinforcement - learning model can "learn" effective investment strategies from existing classic models, thereby enhancing its performance in the actual market. ### Specific objectives: - **Develop the KDD model**: Through two - stage training (the supervised - learning stage and the reinforcement - learning stage), train an intelligent agent that can effectively optimize portfolios. - **Supervised - learning stage**: Use the optimal portfolios generated by the Markowitz model as the teacher model, and transfer these strategies to the DDPG model through knowledge distillation. - **Reinforcement - learning stage**: In the real - market environment, further optimize the investment - decision - making ability of the DDPG model through interaction with the environment. - **Evaluate the model performance**: Through comparative experiments with traditional financial models and other AI frameworks, verify the superiority of the KDD model in multiple evaluation indicators (such as total return, Sharpe ratio, maximum drawdown, etc.). ### Innovation points of the paper: - **Application of knowledge distillation**: Through knowledge distillation, the experience of the Markowitz model is incorporated into the DDPG model, enabling the reinforcement - learning model to have a certain investment - strategy foundation at the initial stage, thereby accelerating the learning process and improving the final performance. - **Two - stage training method**: Combine the advantages of supervised learning and reinforcement learning. First, let the model quickly master basic investment strategies through supervised learning, and then continuously optimize and adapt to the complex market environment through reinforcement learning. In summary, the goal of this paper is to develop a more efficient and intelligent portfolio - management method by combining traditional financial theory and modern artificial - intelligence technologies to deal with the complexity and uncertainty in the financial market.

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Inverse Reinforcement Learning with Multiple Ranked Experts

Deep Reinforcement Learning Model for Stock Portfolio Management Based on Data Fusion

Reinforcement Learning-Based Multimodal Model for the Stock Investment Portfolio Management Task

Bridging the gap between Markowitz planning and deep reinforcement learning

Online Optimal Investment Portfolio Model Based on Deep Reinforcement Learning

A Deep Reinforcement Learning Framework For Financial Portfolio Management

Optimistic Bull or Pessimistic Bear: Adaptive Deep Reinforcement Learning for Stock Portfolio Allocation

A Novel Experts Advice Aggregation Framework Using Deep Reinforcement Learning for Portfolio Management

Combining Transformer based Deep Reinforcement Learning with Black-Litterman Model for Portfolio Optimization

Explainable Post hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning agent

A Deep Reinforcement Learning Approach for Portfolio Management in Non‐Short‐Selling Market

Evaluation of Deep Reinforcement Learning Algorithms for Portfolio Optimisation

Risk Sensitive Distributional Soft Actor Critic for Portfolio Management

A General Framework on Enhancing Portfolio Management with Reinforcement Learning

Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach

CAD: Clustering And Deep Reinforcement Learning Based Multi-Period Portfolio Management Strategy

DeepTrader: A Deep Reinforcement Learning Approach for Risk-Return Balanced Portfolio Management with Market Conditions Embedding

The Design and Implementation of Quantum Finance-based Hybrid Deep Reinforcement Learning Portfolio Investment System

Developing A Multi-Agent and Self-Adaptive Framework with Deep Reinforcement Learning for Dynamic Portfolio Risk Management