Reinforcement learning
LearningMachine Learning


Basic of Probability and Statistics (niveau L3 maths ou GE)

Objectif du cours

Introduction to the models and mathematical tools used in formalizing the problem of learning and decision-making under uncertainty. In particular, we will focus on the frameworks of reinforcement learning and multi-arm bandit.

Organisation des séances

  • 8 cours théoriques de 2h
  • 3 travaux dirigés de 3h

Mode de validation

Reading of papers of interest, implementation or theoretical analysis of reinforcement learning algorithms. The project will be evaluated on the basis of a short report and an oral presentation.


  • Processus decisionnels de Markov et Intelligence Artificielle, 2008. Editeurs O. Sigaud et O. Buffet.
  • Neuro-Dynamic Programming, Bertsekas et Tsitsiklis, 1996.

Thèmes abordés

  • Historical multi-disciplinary basis of reinforcement learning
  •  Markov decision processes and dynamic programming
  • Stochastic approximation and Monte-Carlo methods
  •  Function approximation and statistical learning theory
  • Approximate dynamic programming
  • Introduction to stochastic and adversarial multi-arm bandit
  • Learning rates and finite-sample analysis
Les intervenants


voir les autres cours du 1er semestre