Course curriculum
-
1
Introduction to Reinforcement Learning
- Course Handouts
- What is Reinforcement Learning
- Key Concepts and Terminology
- Quiz
-
2
Markov Decision Processes
- Introduction to MDPs
- Policy and Value Functions
- Bellman Equation
- Quiz
-
3
Dynamic Programming
- Dynamic Programming
- Policy Evaluation and Policy Iteration
- Value Iteration
- Value Iteration Implementation
- Quiz
-
4
Monte Carlo and Temporal-Difference Methods
- Monte Carlo Prediction
- TD Prediction
- Hands on: MC vs TD Implementation
- Quiz
-
5
Model-Free Control
- Generalized Policy Iteration
- SARSA (State-Action-Reward-State-Action)
- Q-Learning
- Quiz
-
6
Introduction to Deep Reinforcement Learning Algorithm
- Deep Learning Recap
- Combining Deep Learning with RL
- Deep Q-Network
-
7
Proximal Policy Optimization ( PPO ) and DDPG
- Deep Deterministic Policy Gradient (DDPG)
- Intro To Proximal Policy Optimization
-
8
Introduction to RLHF and DPO
- What is RLHF
- What is DPO
-
9
Techniques and Algorithms for RLHF and DPO
- Implementing RLHF
- Implementing DPO
-
10
Hands-on
- Fine-Tuning a Large Language Model with RLHF
- Hands-on: Applying DPO to Text Generation