LLM for code and proof

N. FIJALKOW, M. LELARGE

Deep LearningLearningNatural Language Processing

Prè-requis

The course requires mastery of Python and Pytorch as well as basic knowledge of linear algebra and optimization.

The following courses taught at MVA cover the necessary prerequisites (only one of these courses would enable students to be up to speed for our course):

Deep Learning https://www.master-mva.com/cours/cat-deep-learning/
Advanced learning for text and graph data https://www.master-mva.com/cours/cat-advanced-learning-for-text-and-graph- data-altegrad/
Fondements théoriques du deep learning https://www.master-mva.com/cours/fondements-theoriques-du-deep-learning/

Objectif du cours

Recent advances in large language models (LLMs) have enabled remarkable progress in program synthesis and code generation. This course explores the foundations and methodologies behind modern neural code generation, with a particular focus on Transformer-based architectures and LLM techniques. The course has two main objectives: (1) to provide students with a deep understanding of the core techniques for training and fine-tuning neural models for code generation, including inference strategies and evaluation metrics specific to code, and (2) to introduce current research in neural program synthesis, highlighting applications in software engineering, reasoning, and formal verification.

Course page : https://llm.labri.fr/

Organisation des séances

Organization of sessions : https://llm.labri.fr/

The course can accommodate up to 90 students.

The course is not open to external auditors

Mode de validation

Homework (30%) + projet (70%)

Thèmes abordés

Transformer architectures, attention mechanisms, and KV-cache for efficient inference.
Tokenization strategies for linguistic and code-based datasets.
Fine-tuning techniques such as LoRA for task-specific adaptation.
Scaling laws for optimizing LLM performance.
Decoding strategies for code generation, including sampling-based and greedy methods.
Retrieval-augmented generation (RAG) for incorporating external knowledge.
Structured generation techniques for syntax-constrained outputs.
Applications of LLMs in formal verification and automated theorem proving.

By the end of the course, students will gain both theoretical insights and hands-on experience in building and evaluating neural models for code generation.

Plus de détails

Les intervenants

Nathanaël FIJALKOW

(CNRS, LaBRI)

Marc LELARGE

(INRIA Paris)

voir les autres cours du 2nd semestre