Home

2023-2024 Spring

Dr. Mesut Körpe

Course Description

Large Language Models

This comprehensive 14-week course covers the basics and advanced ideas of Large Language Models, covering topics such as transformer architectures, deep neural networks, and various model variants such as BERT, GPT, and T5. Students will learn about tokenization strategies, word vectors, and the development of linguistic representations. The remainder of the course covers efficient parameter approaches, model distillation, and fine-tuning methods. Particular attention is paid to cutting-edge models such as prompt engineering, reinforcement learning, and RAG. In the following weeks, we will provide a comprehensive overview of the latest research and applications by broadening our focus to multimodality in language models.

Week 1: Deep Neural Networks

Fundamentals of Neural Networks
Residual Networks (ResNets)
Regularization in Neural Networks, Layer normalization
Linearity vs Non-linearity

Week 2: Word Vectors and evolution of Word vectors

Introduction to Word Vectors and their evolution (One-hot encoding, TF-IDF, Word2Vec, GloVe, FastText)
Semantic Word vectors vs Contextual Word vectors
Tokenization in NLP.
Introduction to BitPairEncoding (BPE) and WordPiece.

Week 3: Transformer Architecture (Overview)

Overview of the Transformer architecture.
Embedding Layer of Transformer
Positional encoding in Transformers.

Week 4: Transformer Architecture Continued (Atention)

Attention Mechanism in Transformer
Self Attention
Cross Attention
Causal Attention

Week 5: (Encoder-Only Models and Encoder-Decoder Models )

Introduction to BERT.
Masked Language Model (MLM) and Next Sentence Prediction (NSP) objectives.
Pre-training and fine-tuning strategies.
Introduction to T5.
Pre-training and fine-tuning.

Week 6: (Decoder-Only Models, GPT and LLAMA 2)

GPT (Generative Pre-trained Transformer)
LLAMA 2 (ROPE, RMSNorm, Group Attention)

Week 7-8: LLM Distillation and Fine-Tuning

Transfer learning in large language models.
Concept of model distillation and its techniques.
Strategies for fine-tuning, (Chain of Thought, CoT).
Parameter Efficient Fine-Tuning (PEFT), LoRA, Adapters
Quantization, QLoRA

Week 9-10: LLM Alignment (RLHF)

Basics of Reinforcement Learning ( RL ) and Q learning.
Reinforcement Learning from Human Feedback (RLHF)
Reward Model
Proximal Policy Optimization (PPO)

Week 11-12: Prompting in LLM

Prompt Engineering
In-Context Learning
Retrieval-Augmented Generation (RAG)

Week 13-14: Multimodalities in LLM

Text, video and audio in Transformer Arcihtecture
CLIP, Une

Grading Policy

%15 Survey Paper (about one of the LLM topics in the Course Overview sheet and must be in arxiv template, https://www.overleaf.com/gallery/tagged/arxiv).
%30 Midterm Exam
%40 Final Exam
%15 Final Presentation

CENG685

Selected Topics in Computer Engineering I