Geometric data analysis
J. FEYDY
ModellingTrack Santé

Prè-requis

linear algebra, basic programming with Python

Objectif du cours

Choosing the correct loss, distance and convolution for every problem

Machine learning sits at the intersection between modelling, software engineering and
statistics: data scientists design models that must satisfy domain-specific constraints,
scale up to real data and can be trained with guarantees.
In this context, geometry is a convenient tool to encode expert knowledge. Clever
distances between data samples allow our methods to leverage key priors while keeping
a simple mathematical form. This is especially relevant when working with samples that
have a strong structure such as anatomical images, protein data or physical simulations.
The purpose of this class is to provide a unifying geometric perspective on common
machine learning models. By the end of the semester, students will be able to answer
the three following questions:
Should I represent my data samples using vectors, graphs or histograms?
How impactful is the choice of a Euclidean norm vs. a cross entropy?
Can I relate kernels, graphs, manifolds, optimal transport and transformer net-
works to each other ?

Presentation : here

Organisation des séances

1. Introduction: why geometry?
Linear models and the curse of dimensionality
Geometry as a modelling tool
A motivating example: to work with triangles, use a sphere!

Overview of the class
Lab session: first steps with NumPy, PyTorch and KeOps
2. Linear models on curved spaces
Linear regression
Riemannian metrics
The exponential map
Geodesic regression
Lab session: geodesics in spaces of matrices
3. Feature engineering, kernels
Principal Component Analysis (PCA)
Feature engineering
The kernel trick
Kernel PCA
Lab session: which kernel should I use?
4. Graphs, discrete spaces and manifold embeddings
From local to global structures
The K-Nearest Neighbors graph
Graphs vs kernel embeddings?
Optimizing a faithful representation
Lab session: the UMAP toolbox
5. Algorithmic bottlenecks
Strengths and limitations of GPU hardware
Optimizing register usage
Annealing, multiscale methods
Strategies to “beat” the curse of dimensionality
Lab session: approximate nearest neighbor search
6. Working with probability distributions
Histograms, measures and dual-adversarial losses
Pointwise metrics: cross-entropy, information geometry
Kernel metrics: Sobolev spaces, Maximum Mean Discrepancies
Optimal transport: Wasserstein distance and Earth Mover’s problem
Lab session: comparison between loss functions – theory and practice
7. Geometric deep learning
Convolutions on 3D objects
Convolutions on graphs
Dynamic graph CNNs, transformers, attention layers
Conclusion
Lab session: project presentations, final quizz

Mode de validation

project + presentation + quizz.
Joint projects with other classes are welcome.

Thèmes abordés

This class provides foundational background for geometry in machine learning. This
will be of interest to students who also attend the following lectures (relevant sessions
of the class highlighted in parentheses):
1st semester:
Introduction to medical image analysis (lectures 1-7)
Computational optimal transport (lectures 5-6).
Topological data analysis (lectures 4-5)
Deep learning (lecture 7)
2nd semester:
Geometry and shapes spaces (lectures 1-7)
Longitudinal data analysis (lectures 1-7)
Kernel methods for machine learning (lectures 3-6)
3D point clouds – NPM3D (lectures 5-7)
Deep learning for medical imaging (lectures 5-7)
Graphs in machine learning (lectures 4-5, 7)
Generative models for imaging (lectures 3-6)
Time series (lectures 3-5, 7)
Inverse problems and imaging (lectures 3, 5-6)
Deformable models and geodesic methods (lectures 1-2, 5)
Information and complexity (lectures 1, 5-6)
Numerical PDEs for image analysis (lecture 2)

Les intervenants

Jean FEYDY

voir les autres cours du 1er semestre