Geometric data analysis
J. FEYDY
ModellingTrack Santé

Prè-requis

linear algebra, basic programming with Python

Objectif du cours

Choosing the correct loss, distance and convolution for every problem

Machine learning sits at the intersection between modelling, software engineering and statistics: data scientists write programs that must satisfy domain-specific constraints, scale up to real data and can be trained with guarantees.

In this context, geometry is a convenient tool to encode expert knowledge. Clever distances between data samples allow our methods to enforce fundamental properties while keeping a simple mathematical form. This is especially relevant when dealing with structured data such as anatomical images, protein conformations or physical simulations.

The purpose of this class is to provide a unifying geometric perspective on common machine learning models. By the end of the semester, students will be able to answer the three following questions:
– Should I represent my data samples using vectors, graphs or histograms?
– How impactful is the choice of a Euclidean norm vs. a cross entropy?
– Can I relate kernels, graphs, manifolds, optimal transport and transformer networks to each other?

 

Presentation : here

Website: https://www.jeanfeydy.com/Teaching/index.html

Organisation des séances

7 lectures, 3 hours each.

1. Introduction: why geometry?
– The curse of dimensionality
– Geometry as a modelling tool
– Two motivating examples: the sphere of triangles and Deep Art
– Overview of the class

2. Flat vector spaces
– Decision trees
– Nearest neighbors
– Linear models
– Neural networks
– Kernel methods
– Lab session: the scikit-learn classifiers

3. Graphs and embeddings
– Normal distributions are… bubbles!
– The manifold hypothesis
– Dimension
– Curvature
– Lab session: data visualization with UMAP

4. Geometric deep learning
– Convolutions on graphs
– Five major challenges
– Two case studies: protein docking and lung registration
– Should we learn our graphs?
– Lab session: first steps with PyG

5. Riemannian metrics and geodesics
– The Poincaré disk
– Discrete and continuous models
– Shortest paths
– Practical applications
– Lab session: first steps with GeomStats

6. Probability distributions
– Geometry in statistics
– Information geometry
– Kernel norms
– Optimal transport
– Lab session: first steps with GeomLoss

7. GPU programming
– What is a GPU?
– Main rules of GPU programming
– Accelerating scientific code
– Is Moore’s law coming to an end?

Mode de validation

Project report and presentation (15 points) + quizz (5 points).
Joint projects with other classes are welcome.

Although in-person attendance is preferable, you may validate this class remotely.
Video recordings are available here: https://www.youtube.com/watch?v=mkZ_x46VAqY&list=PLBFtqeJgRBGguiWbxWEz-Ty0CEPpTbZxC
(Audio quality improves after lecture 2.)

 

Thèmes abordés

This class provides foundational background for geometry in machine learning. This will be of interest to students who also attend the following lectures:

1st semester:
– Medical image analysis
– Computational optimal transport
– Topological data analysis
– Deep learning

2nd semester:
– Geometry and shapes spaces
– Kernel methods for machine learning
– 3D point clouds
– Deep learning for medical imaging
– Graphs in machine learning
– Generative models for imaging
– Inverse problems and imaging
– Deformable models and geodesic methods
– Information and statistical physics
– Numerical PDEs for image analysis

Les intervenants

Jean FEYDY

voir les autres cours du 1er semestre