Training and deploying Large-Scale Models
E. OYALLON
Deep LearningLearning

Objectif du cours

This course introduces the foundations and practices of training modern Large Language Models (LLMs) at scale. Students will learn how deep learning models are trained across multiple GPUs, nodes, and clusters, and why distributed training is essential for today’s largest AI systems.

We will cover:

  • Core techniques for distributed training
  • Modern frameworks and scaling strategies
  • Practical implementations with real-world toolchains
  • Theoretical underpinnings of large-scale learning
  • Inference and applications

As LLMs grow in complexity and impact, understanding how they are built and deployed has become essential for researchers and engineers. This series bridges engineering and theory, offering students both the practical skills and deeper insights needed to work with frontier AI systems.

To learn more : https://training-large-models-course.github.io/

Organisation des séances

The course will consist of 8 sessions. The first 7 sessions will each include 2 hours of lectures followed by 2 hours of hands-on lab work. The final session will be dedicated to grading.

  1. Getting Started on Distributed LLM Training
  2. Systems for ML
  3. Multi-GPU Parallelization Techniques
  4. Communication-Efficient Distributed Optimization
  5. Post-training
  6. Serving and deployment
  7. Agentic AI (tentative)
  8. Grading

Bring your laptop and follow the class webpage to install the required libraries (prefer GPU when available) and to find all project and homework instructions.

 

Mode de validation

Grades will be based on:
• Homework 1 (25%)
• Homework 2 (25%)
• Project (50%)

Les intervenants

Edouard OYALLON

CNRS, Sorbonne Université

voir les autres cours du 2nd semestre