Matrix Low-Rank Approximation For Policy Gradient Methods

Sergio Rozada,Antonio G. Marques
2024-05-28
Abstract:Estimating a policy that maps states to actions is a central problem in reinforcement learning. Traditionally, policies are inferred from the so called value functions (VFs), but exact VF computation suffers from the curse of dimensionality. Policy gradient (PG) methods bypass this by learning directly a parametric stochastic policy. Typically, the parameters of the policy are estimated using neural networks (NNs) tuned via stochastic gradient descent. However, finding adequate NN architectures can be challenging, and convergence issues are common as well. In this paper, we put forth low-rank matrix-based models to estimate efficiently the parameters of PG algorithms. We collect the parameters of the stochastic policy into a matrix, and then, we leverage matrix-completion techniques to promote (enforce) low rank. We demonstrate via numerical studies how low-rank matrix-based policy models reduce the computational and sample complexities relative to NN models, while achieving a similar aggregated reward.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the computational complexity and sample complexity issues faced by the Policy Gradient (PG) method in Reinforcement Learning (RL). Specifically, traditional methods usually rely on neural networks (NNs) to estimate policy parameters, but this method has the following challenges: 1. **Difficulty in architecture selection**: Finding a suitable neural network architecture is a difficult problem, and different tasks require different architectures. 2. **Convergence problems**: Convergence problems are easily encountered during the neural network training process, especially in high - dimensional state spaces. 3. **Computational and sample complexity**: Neural network models usually have a large number of parameters, resulting in high computational costs and the need for a large amount of sample data. To solve these problems, this paper proposes a Low - Rank Policy Gradient (LRPG) method based on low - rank matrices. By organizing the policy parameters into matrices and using matrix completion techniques to promote low - rank structures, the number of parameters can be effectively reduced and the generalization ability of the model can be improved. Specifically, the main contributions of this paper include: - **Low - rank matrix modeling**: Represent the mean and standard deviation parameters of the policy as low - rank matrices, reducing the number of parameters and alleviating the curse of dimensionality problem. - **Efficient parameter estimation**: Use low - rank matrix decomposition techniques for parameter estimation, reducing computational complexity and sample complexity. - **Experimental verification**: Experiments were carried out in three standard continuous - action reinforcement learning tasks to verify the effectiveness of the LRPG method and show its advantages in parameter efficiency, convergence speed, and return. Through these improvements, the LRPG method can not only achieve cumulative rewards similar to those of neural - network - based methods, but also shows significant advantages in the number of parameters and convergence speed.