Deep Reinforcement Learning
B. PIOT, C. TALLEC, F. STRUB, J.B. GRILL
Deep LearningLearning

Prè-requis

● Theoretical tools: linear algebra, basic probabilities, reinforcement learning.
● Programming tools: familiarity with python

Objectif du cours

Objective of the course:
Be it on Atari Games, Go, Chess, Starcraft II or Dota, Deep Reinforcement Learning (DRL) has
opened up Reinforcement Learning to a variety of large scale applications. While it could
formally appear as a straightforward extension of reinforcement learning to deep learning based
function approximations, DRL often involves more than simply plugging the newest deep
learning architecture into the best theoretical reinforcement learning method. In this course, we
will journey through the recent history of DRL, from the now seminal Neural fitted-Q, to the
most popular Deep Q-Network (DQN) and Asynchronous Actor Critic, to the latest MuZero. At
each milestone, we will emphasize both the theoretical and practical considerations underlying
the improvements in each algorithm. This will range from addressing overestimation biases, to
fixing off-policyness in multi-step return estimates, to properly regularizing policy optimization
steps. Alongside the lectures, the practical session will revolve around implementing and testing
DRL algorithms in JAX and Haiku on simple environments. At the end of the course, students
should have a good understanding of the broad principles underlying the conception of DRL
algorithms, as well as hands-on experience on how to implement them.

Organisation des séances

● 6 lectures, 2 hours long
● 6 coding sessions (TP/TD), 2 hours long

Mode de validation

Validation:
● Class project (60%) and reports/implementations from the practical (40%). The project could
consist of solving a small to medium scale RL problem using the tools and knowledge
acquired during the course. The project will be evaluated based on provided code, a short
report and a presentation.
Rattrapage:
● The rattrapage will consist in having the possibility to hand in the class project after the
initial deadline, with a total grade capped at 12/20.

Les intervenants

Bilal PIOT

Corentin TALLEC

Florian STRUB

Jean-Bastien GRILL

voir les autres cours du 2nd semestre