2023-2024 Spring
Dr. Mesut Körpe
Course Description
Large Language Models
This comprehensive 14-week course covers the basics and advanced ideas of Large Language Models, covering topics such as transformer architectures, deep neural networks, and various model variants such as BERT, GPT, and T5. Students will learn about tokenization strategies, word vectors, and the development of linguistic representations. The remainder of the course covers efficient parameter approaches, model distillation, and fine-tuning methods. Particular attention is paid to cutting-edge models such as prompt engineering, reinforcement learning, and RAG. In the following weeks, we will provide a comprehensive overview of the latest research and applications by broadening our focus to multimodality in language models.
Week 1: Deep Neural Networks
- Fundamentals of Neural Networks
- Residual Networks (ResNets)
- Regularization in Neural Networks, Layer normalization
- Linearity vs Non-linearity
Week 2: Word Vectors and evolution of Word vectors
- Introduction to Word Vectors and their evolution (One-hot encoding, TF-IDF, Word2Vec, GloVe, FastText)
- Semantic Word vectors vs Contextual Word vectors
- Tokenization in NLP.
- Introduction to BitPairEncoding (BPE) and WordPiece.
Week 3: Transformer Architecture (Overview)
- Overview of the Transformer architecture.
- Embedding Layer of Transformer
- Positional encoding in Transformers.
Week 4: Transformer Architecture Continued (Atention)
- Attention Mechanism in Transformer
- Self Attention
- Cross Attention
- Causal Attention
Week 5: (Encoder-Only Models and Encoder-Decoder Models )
- Introduction to BERT.
- Masked Language Model (MLM) and Next Sentence Prediction (NSP) objectives.
- Pre-training and fine-tuning strategies.
- Introduction to T5.
- Pre-training and fine-tuning.
Week 6: (Decoder-Only Models, GPT and LLAMA 2)
- GPT (Generative Pre-trained Transformer)
- LLAMA 2 (ROPE, RMSNorm, Group Attention)
Week 7-8: LLM Distillation and Fine-Tuning
- Transfer learning in large language models.
- Concept of model distillation and its techniques.
- Strategies for fine-tuning, (Chain of Thought, CoT).
- Parameter Efficient Fine-Tuning (PEFT), LoRA, Adapters
- Quantization, QLoRA
Week 9-10: LLM Alignment (RLHF)
- Basics of Reinforcement Learning ( RL ) and Q learning.
- Reinforcement Learning from Human Feedback (RLHF)
- Reward Model
- Proximal Policy Optimization (PPO)
Week 11-12: Prompting in LLM
- Prompt Engineering
- In-Context Learning
- Retrieval-Augmented Generation (RAG)
Week 13-14: Multimodalities in LLM
- Text, video and audio in Transformer Arcihtecture
- CLIP, Une
Grading Policy
-
%15 Survey Paper (about one of the LLM topics in the Course Overview sheet and must be in arxiv template, https://www.overleaf.com/gallery/tagged/arxiv).
-
%30 Midterm Exam
-
%40 Final Exam
-
%15 Final Presentation