Probability and Optimization Notion.
Objectif du cours
In online learning, data are acquired and treated on the fly; feedbacks are received and algorithms uploaded on the fly. This field has received a lot of attention recently because of the possible applications coming from internet. They include choosing which ads to display, repeated auctions, spam detection, experts/algorithm aggregation (and boosting), etc.
The objectives of the course (in English) is to introduce and study the main concepts (regret, calibration, etc.) of online learning, construct algorithms and show connection with game theory.
We will also cover the bandit setting (cf the course of Reinforcement learning) and its generalization, the partial monitoring
Organisation des séances
Mode de validation
Prediction, learning, and games Nicolò Cesa-Bianchi and Gábor Lugosi Cambridge University Press, 2006
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems S. Bubeck and N. Cesa-Bianchi, . In Foundations and Trends in Machine Learning, Vol 5: No 1, 1-122, 2012.
Approachability, Regret and Calibration: Implications and equivalences. V. Perchet, Journal of Dynamics and Games, 1:181-254, 2014
Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.
* Regret minimization
* Exponential weights algorithms
* Stochastic Optimization
* Game Theory