Abstract:While research of reinforcement learning applied to financial markets predominantly concentrates on finding optimal behaviours, it is worth to realize that the reinforcement learning returns $G_t$ and state value functions themselves are of interest and play a pivotal role in the evaluation of assets. Instead of focussing on the more complex task of finding optimal decision rules, this paper studies and applies the power of distributional state value functions in the context of financial market valuation and machine learning based trading algorithms. Accurate and trustworthy estimates of the distributions of $G_t$ provide a competitive edge leading to better informed decisions and more optimal behaviour. Herein, ideas from predictive knowledge and deep reinforcement learning are combined to introduce a novel family of models called CDG-Model, resulting in a highly flexible framework and intuitive approach with minimal assumptions regarding underlying distributions. The models allow seamless integration of typical financial modelling pitfalls like transaction costs, slippage and other possible costs or benefits into the model calculation. They can be applied to any kind of trading strategy or asset class. The frameworks introduced provide concrete business value through their potential in market valuation of single assets and portfolios, in the comparison of strategies as well as in the improvement of market timing. They can positively impact the performance and enhance the learning process of existing or new trading algorithms. They are of interest from a scientific point-of-view and open up multiple areas of future research. Initial implementations and tests were performed on real market data. While the results are promising, applying a robust statistical framework to evaluate the models in general remains a challenge and further investigations are needed.

The Gambler's Problem and Beyond.

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

On the continuity and smoothness of the value function in reinforcement learning and optimal control

Approximate optimality and the risk/reward tradeoff given repeated gambles

A Lattice of Gambles

A puzzle of roulette gambling

Distinguishing Risk Preferences using Repeated Gambles

Learning Thresholds with Latent Values and Censored Feedback

A symbolic computational approach to the generalized gambler's ruin problem in one and two dimensions

Extended gambler's ruin problem

How to Gamble If You're In a Hurry

A random journey through the math of gambling

Exploiting Distributional Value Functions for Financial Market Valuation, Enhanced Feature Creation and Improvement of Trading Algorithms

Value Maximization under Stochastic Quasi-Hyperbolic Discounting

Gambler's ruin estimates on finite inner uniform domains

Statistical inference of the value function for reinforcement learning in infinite‐horizon settings

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

A General Framework for Analyzing Stochastic Dynamics in Learning Algorithms

Learning to Optimally Stop a Diffusion Process