Prè-requis
Basic linear algebra, calculus, probability theory
Objectif du cours
Speech and natural language processing is a subfield of artificial intelligence used in an increasing number of applications; yet, while some aspects are on par with human performances, others are lagging behind. This course will present the full stack of speech and language technology, from automatic speech recognition to parsing and semantic processing. The course will present, at each level, the key principles, algorithms and mathematical principles behind the state of the art, and confront them with what is know about human speech and language processing. Students will acquire detailed knowledge of the scientific issues and computational techniques in automatic speech and language processing and will have hands on experience in implementing and evaluating the important algorithms.
[style1;Topics :]
– speech features & signal processing
– hidden markov & finite state modeling
– probabilistic parsing
– continuous embeddings
– deep learning for language-related tasks (DNNs, RNNs)
– linguistics and psycholinguistics
– comparing human and machine performance
Organisation des séances
Eight courses (2h) and 6 practical assignments (QAs for 1 hour) based around the implementation of key algorithms. For the assignments, students are provided with the necessary data and Python code and will hand in their source code and a max two page report, detailing their work, the difficulties encountered and the results.
1. Speech and language processing: Principles and applications (2h)
Presentation of the main speech and language processing stack, main computational challenges, and main practical applications.
No Assignement Readings
2. Speech I: Acoustic modeling (2h + 1h)
Algorithms: signal processing, speech features, source separation, GMMs, DNNs
Human processing: auditory neuroscience, psychoacoustics, speech perception, scene analysis
Assignment. Evaluating Speech Features
Given a set of modules (filterbank, compressor, expander, spectrogram), perform ABX on different combinations and evaluate classification performance.
Readings
J&M2 7.1-7.4, 9.3
3. Speech II: Language modeling (2h+1h)
Algorithms: HMM decoding: forward, Viterbi, RNN, LSTM, end-to-end CTC, embeddings. Human processing: speaker and accent adaptation, L2 perception
Application speech: application orthographique
Assignment. HMM Decoder
Given trained GMM-HMM, build a decoder and evaluate phone error rate on test set
Readings
J&M2 9.1-9.8
4. Language I: Formal Grammars and Syntax (2h+1)
Algorithms: Chomsky’s hierarchy, Finite state transducers, Context Free grammars, Mildly context sensitive formalisms
Human processing: linguistic typology
Assignment: POS Tagger
Given a training data, build a Hidden Markov Part-of-Speech tagger and evaluate accuracy on test set.
Readings
J&M3, ch. 9 & 10
5. Language II: Parsing (2h+1h)
Algorithms: PCFGs, treebanks, estimation, chart & dependency parsing Human processing: syntax, garden-paths, acquisition, dependency, hierarchy
Assignment: Probabilistic Parsing
Given a treebank, extract a PCFG, implement a CKY parser and evaluate parsing f1-score on a test set.
Readings
M. Covington (2001) – A Fundamental Algorithm for Dependency Parsing: http://www.stanford.edu/~mjkay/covington.pdf
J&M3 ch. 12 & 13
6. Language III: Language Processing in the wild (2h+1h)
Algorithms: text normalization, coreference, distributional semantics, word embeddings
Human processing: conversational & casual language
Assignment: Evaluating Topic Models
Given a dataset of documents and human topic annotation, correlate different topic models with human judgements.
Readings:
J. Chang, J. Boyd-Graber, C. Wang, S. Gerrish, and D. Blei (2009). Reading Tea Leaves: How Humans Interpret Topic Models. Neural Information Processing Systems. http://www.umiacs.umd.edu/~jbg/docs/nips2009-rtl.pdf
7. Language IV: Lost in translation (2h+1h)
Algorithms: CNNs, sequence to sequence RNNs, attentional models, graph models Human processing:
Assignment:
Test find systematic patterns of errors in Google Translate
8. Open questions and hot topics (2h)
– linguistic and non linguistic context
– Unsupervised learning
– Domain adaptation & zero shot learning
Références
The recommended, but not obligatory textbook for the course is D. Jurafsky & J. Martin – Speech and Language Processing, 3rd (online) edition for already available chapters [J&M3], 2nd edition otherwise [J&M2]. Readings for each of the sessions will be provided by the instructors.
Validation
Grading will be on the six homework sets, and the grade will be computed on the 5 best assignments.
Emmanuel Dupoux
(INRIA CoML)
Benoît Sagot
(INRIA ALMANACH)