Prè-requis
Introductory course on probability, Introductory course on computer programming and data structures.
Objectif du cours
An introductory course on kernel methods for machine learning.
Many problems in real-world applications of machine learning can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, with the caveat that the data are often not vectors of numbers. For example, protein sequences and structures in computational biology, text and XML documents in web mining, segmented pictures in image processing, or time series in speech recognition and finance, have particular structures which contain relevant information for the statistical problem but can hardly be encoded into finite-dimensional vector representations.
Kernel methods are a class of algorithms well suited for such problems. Indeed they extend the applicability of many statistical methods initially designed for vectors to virtually any type of data, without the need for explicit vectorization of the data. The price to pay for this extension to non-vectors is the need to define a so-called positive definite kernel function between the objects, formally equivalent to an implicit vectorization of the data. The “art” of kernel design for various objects have witnessed important advances in recent years, resulting in many state-of-the-art algorithms and successful applications in many domains.
The goal of this course is to present the mathematical foundations of kernel methods, as well as the main approaches that have emerged so far in kernel design. We will start with a presentation of the theory of positive definite kernels and reproducing kernel Hilbert spaces, which will allow us to introduce several kernel methods including kernel principal component analysis and support vector machines. Then we will come back to the problem of defining the kernel. We will present the main results about Mercer kernels and semigroup kernels, as well as a few examples of kernel for strings and graphs, taken from applications in computational biology, text processing and image analysis. Finally we will touch upon topics of active research, such as large-scale kernel methods and deep kernel machines.
Organisation des séances
– For this class, there will be 9 sessions in total + a final written exam. We will adopt a hybrid format:
The first and last lectures (lecture 1 and 9) will be full lectures (from 1:30pm to 4pm) at ENS Paris-Saclay: 11/01 and 22/03. There is no other lecture scheduled in Paris, so it shouldn’t be an issue to commute to Saclay for those days.
The remaining lectures (lectures 2 to 8) will consist of interactive sessions from 3pm to 4pm. Some of these sessions will be held at ENS Saclay and others will be online (on Zoom).
Before each interactive session (from lecture 2 to 8), we assumed that you have studied the set of slides and videos corresponding to the lecture (typically 1.5H long).
On the official schedule, the class starts at 1:30pm, which gives you enough time for that, before participating to one-hour interactive sessions with the instructors.
During these, we will take questions, discuss solutions to homeworks, etc… These one-hour sessions will take place Wednesdays from 3pm to 4pm.
– Interactive sessions at ENS Paris-Saclay: corresponding to lecture 4 and 6 (08/02 and 01/03). In these sessions, we will discuss the solutions to the homework.
– Interactive sessions on Zoom: Correspond to lectures 2, 3, 5, 7 and 8.
The final exam will be at ENS Paris Saclay on 29/03.
Mode de validation
The grading of the class will be done with (i) one final exam (40%), (ii) a data challenge (40%), (iii) 3 homework to submit online (20%).
Références
- N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337-404, 1950.
- C. Berg, J.P.R. Christensen et P. Ressel. Harmonic analysis on semi-groups. Springer, 1994.
- N. Cristianini and J. Shawe-Taylor. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
Thèmes abordés
- Positive definite kernels
- Reproducing kernel Hilbert spaces
- Kernel trick
- Representer theorem
- Kernel ridge regression
- Support vector machines
- Kernel k-means and Spectral clustering
- Kernel PCA
- Kernel CCA
- Mercer kernels
- Kernel for strings
- Kernels for graphs
- Kernels on graphs
- Multiple kernel learning
- Large-scale learning with kernels
- Deep learning with kernels
- Kernel mean embedding
Julien Mairal
(INRIA)
Michael Arbel
(INRIA)