Sequential learning
P. GAILLARD, E. BOURSIER
LearningTheory

Prè-requis

Probability and Optimization Notion.

Objectif du cours

In online learning, data are acquired and treated on the fly; feedbacks are received and algorithms uploaded on the fly. This field has received a lot of attention recently because of the possible applications coming from internet. They include choosing which ads to display, repeated auctions, spam detection, experts/algorithm aggregation (and boosting), etc.
The objectives of the course (in English) is to introduce and study the main concepts (regret, calibration, etc.) of online learning, construct algorithms and show connection with game theory.

We will also cover the bandit setting (cf the course of Reinforcement learning) and its generalization, the partial monitoring

Organisation des séances

6 classes on the blackboard.

Mode de validation

Devoir Maison
Final Exam

Références

Prediction, learning, and games Nicolò Cesa-Bianchi and Gábor Lugosi Cambridge University Press, 2006

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems S. Bubeck and N. Cesa-Bianchi, . In Foundations and Trends in Machine Learning, Vol 5: No 1, 1-122, 2012.

Approachability, Regret and Calibration: Implications and equivalences. V. Perchet, Journal of Dynamics and Games, 1:181-254, 2014

Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.

Thèmes abordés

* Regret minimization
* Calibration
* Exponential weights algorithms
* Stochastic Optimization
* Game Theory

Les intervenants

Pierre GAILLARD

(INRIA)

Etienne BOURSIER

(INRIA)

voir les autres cours du 2nd semestre