2023-2024 Spring

Dr. Mesut Körpe

mesutkorpe@gmail.com

Course Description

Large Language Models 

This comprehensive 14-week course covers the basics and advanced ideas of Large Language Models, covering topics such as transformer architectures, deep neural networks, and various model variants such as BERT, GPT, and T5.  Students will learn about tokenization strategies, word vectors, and the development of linguistic representations.  The remainder of the course  covers efficient parameter approaches, model distillation, and fine-tuning methods.  Particular attention is paid to cutting-edge models such as prompt engineering, reinforcement learning, and RAG.  In the following weeks, we will provide a comprehensive overview of the latest research and applications by broadening our focus to multimodality in language models.

Week 1: Deep Neural Networks

  • Fundamentals of Neural Networks
  • Residual Networks (ResNets)
  • Regularization in Neural Networks, Layer normalization
  • Linearity vs Non-linearity

Week 2: Word Vectors and evolution of Word vectors

  • Introduction to Word Vectors and their evolution (One-hot encoding, TF-IDF, Word2Vec, GloVe, FastText)
  • Semantic Word vectors vs Contextual Word vectors
  • Tokenization in NLP.
  • Introduction to BitPairEncoding (BPE) and WordPiece.

Week 3: Transformer Architecture (Overview)

  • Overview of the Transformer architecture.
  • Embedding Layer of Transformer
  • Positional encoding in Transformers.

Week 4: Transformer Architecture Continued (Atention)

  • Attention Mechanism in Transformer
  • Self Attention
  • Cross Attention
  • Causal Attention

Week 5: (Encoder-Only Models and Encoder-Decoder Models )

  • Introduction to BERT.
  • Masked Language Model (MLM) and Next Sentence Prediction (NSP) objectives.
  • Pre-training and fine-tuning strategies.
  • Introduction to T5.
  • Pre-training and fine-tuning.

Week 6: (Decoder-Only Models, GPT and LLAMA 2)

  • GPT (Generative Pre-trained Transformer)
  • LLAMA 2 (ROPE, RMSNorm, Group Attention)

Week 7-8: LLM Distillation and Fine-Tuning

  • Transfer learning in large language models.
  • Concept of model distillation and its techniques.
  • Strategies for fine-tuning, (Chain of Thought, CoT).
  • Parameter Efficient Fine-Tuning (PEFT), LoRA, Adapters
  • Quantization, QLoRA

Week 9-10: LLM Alignment (RLHF)

  • Basics of Reinforcement Learning ( RL ) and Q learning.
  • Reinforcement Learning from Human Feedback (RLHF)
  • Reward Model
  • Proximal Policy Optimization (PPO)

Week 11-12: Prompting in LLM

  • Prompt Engineering
  • In-Context Learning
  • Retrieval-Augmented Generation (RAG)

Week 13-14: Multimodalities in LLM

  • Text, video and audio in Transformer Arcihtecture
  • CLIP, Une

Grading Policy