Tuesday, October 20 @ 11:00 - 12:30 PM (ONLINE)
Improving reinforcement learning algorithms using an optimal learning rate policy
Othmane Mounjid, École Polytechnique
ABSTRACT: We investigate to what extent one can improve reinforcement learning algorithms. For this, we first show that the classical asymptotic convergence rate O(1/√N) is pessimistic and can be replaced by O((log(N)/N)^Beta) with Beta in [0.5,1] and N the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate. We decompose our policy into two interacting levels: the inner and the outer level. In the inner level, we present an algorithm which, based on a predefined learning rate, constructs a new one whose error decreases faster. In the outer level, we propose an optimal methodology for the selection of the predefined learning rate. Finally, we show empirically that our selection methodology of the learning rate outperforms standard algorithms used in reinforcement learning (RL) for three applications: drift estimation, optimal placement of limit orders and optimal execution of a large number of shares.