Experimental and Theoretical Analysis of Reinforcement Learning Algorithms
- Authors
- Publication Date
- Jul 01, 2024
- Source
- HAL
- Keywords
- Language
- English
- License
- Unknown
- External links
Abstract
In Reinforcement Learning (RL), an agent learns how to act in an unknown environment in order to maximize its reward in the long run. In recent years, the use of neural networks has led to breakthroughs, e.g., in scalability. However, there are still gaps in our understanding of how to best employ neural networks in RL. In this thesis, we improve the usability of neural networks in RL in two ways, presented in two separate parts. First, we present a theoretical analysis of the influence of the number of parameters on learning performance. Second, we propose a simple feature preprocessing based on the Fourier series, which empirically improves performance in several ways.In the first part of this thesis, we study how the number of parameters influences performance. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. We present a theoretical analysis of the influence of network size and L2 regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when this ratio is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Our analysis is based on the regularized Least-Squared Temporal Difference (LSTD) algorithm with random features in an asymptotic regime, as both the number of parameters and states go to infinity while maintaining a constant ratio. We derive deterministic limits of the empirical, the true Mean-Squared Bellman Error (MSBE), and the true Mean-Squared Value Error (MSVE) that feature correction terms responsible for the double descent. We show that those correction terms vanish when the L2 regularization increases or the number of unvisited states goes to zero.In the second part of this thesis, we study the preprocessing of features through a Fourier series. In addition to the number of parameters, the amount of optimization that can be achieved in practice remains limited. Neural networks behave thus as under-parameterized models that are also regularized through early stopping. This regularization induces a spectral bias since fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. We propose a simple Fourier mapping for preprocessing, which improves the learning of high-frequency components and thus helps to overcome the spectral bias in RL. We present experiments indicating that this can lead to significant performance gains in terms of rewards and sample efficiency. Furthermore, we observe that this preprocessing increases the robustness with respect to hyperparameters, leads to smoother policies, and benefits the training process by reducing learning interference, encouraging sparsity, and increasing the expressiveness of the learned features.