Prè-requis
linear algebra, basic programming with Python
Objectif du cours
Choosing the correct loss, distance and convolution for every problem
Machine learning sits at the intersection between modelling, software engineering and
statistics: data scientists design models that must satisfy domain-specific constraints,
scale up to real data and can be trained with guarantees.
In this context, geometry is a convenient tool to encode expert knowledge. Clever
distances between data samples allow our methods to leverage key priors while keeping
a simple mathematical form. This is especially relevant when working with samples that
have a strong structure such as anatomical images, protein data or physical simulations.
The purpose of this class is to provide a unifying geometric perspective on common
machine learning models. By the end of the semester, students will be able to answer
the three following questions:
– Should I represent my data samples using vectors, graphs or histograms?
– How impactful is the choice of a Euclidean norm vs. a cross entropy?
– Can I relate kernels, graphs, manifolds, optimal transport and transformer net-
works to each other ?
Presentation : here
Organisation des séances
1. Introduction: why geometry?
– Linear models and the curse of dimensionality
– Geometry as a modelling tool
– A motivating example: to work with triangles, use a sphere!
– Overview of the class
– Lab session: first steps with NumPy, PyTorch and KeOps
2. Linear models on curved spaces
– Linear regression
– Riemannian metrics
– The exponential map
– Geodesic regression
– Lab session: geodesics in spaces of matrices
3. Feature engineering, kernels
– Principal Component Analysis (PCA)
– Feature engineering
– The kernel trick
– Kernel PCA
– Lab session: which kernel should I use?
4. Graphs, discrete spaces and manifold embeddings
– From local to global structures
– The K-Nearest Neighbors graph
– Graphs vs kernel embeddings?
– Optimizing a faithful representation
– Lab session: the UMAP toolbox
5. Algorithmic bottlenecks
– Strengths and limitations of GPU hardware
– Optimizing register usage
– Annealing, multiscale methods
– Strategies to “beat” the curse of dimensionality
– Lab session: approximate nearest neighbor search
6. Working with probability distributions
– Histograms, measures and dual-adversarial losses
– Pointwise metrics: cross-entropy, information geometry
– Kernel metrics: Sobolev spaces, Maximum Mean Discrepancies
– Optimal transport: Wasserstein distance and Earth Mover’s problem
– Lab session: comparison between loss functions – theory and practice
7. Geometric deep learning
– Convolutions on 3D objects
– Convolutions on graphs
– Dynamic graph CNNs, transformers, attention layers
– Conclusion
– Lab session: project presentations, final quizz
Mode de validation
project + presentation + quizz.
Joint projects with other classes are welcome.
Thèmes abordés
This class provides foundational background for geometry in machine learning. This
will be of interest to students who also attend the following lectures (relevant sessions
of the class highlighted in parentheses):
1st semester:
– Introduction to medical image analysis (lectures 1-7)
– Computational optimal transport (lectures 5-6).
– Topological data analysis (lectures 4-5)
– Deep learning (lecture 7)
2nd semester:
– Geometry and shapes spaces (lectures 1-7)
– Longitudinal data analysis (lectures 1-7)
– Kernel methods for machine learning (lectures 3-6)
– 3D point clouds – NPM3D (lectures 5-7)
– Deep learning for medical imaging (lectures 5-7)
– Graphs in machine learning (lectures 4-5, 7)
– Generative models for imaging (lectures 3-6)
– Time series (lectures 3-5, 7)
– Inverse problems and imaging (lectures 3, 5-6)
– Deformable models and geodesic methods (lectures 1-2, 5)
– Information and complexity (lectures 1, 5-6)
– Numerical PDEs for image analysis (lecture 2)