Mastering Reinforcement Learning: Foundations to Human Feedback

1

Introduction to Reinforcement Learning
- Course Handouts
- What is Reinforcement Learning
- Key Concepts and Terminology
- Quiz
2

Markov Decision Processes
- Introduction to MDPs
- Policy and Value Functions
- Bellman Equation
- Quiz
3

Dynamic Programming
- Dynamic Programming
- Policy Evaluation and Policy Iteration
- Value Iteration
- Value Iteration Implementation
- Quiz
4

Monte Carlo and Temporal-Difference Methods
- Monte Carlo Prediction
- TD Prediction
- Hands on: MC vs TD Implementation
- Quiz
5

Model-Free Control
- Generalized Policy Iteration
- SARSA (State-Action-Reward-State-Action)
- Q-Learning
- Quiz
6

Introduction to Deep Reinforcement Learning Algorithm
- Deep Learning Recap
- Combining Deep Learning with RL
- Deep Q-Network
7

Proximal Policy Optimization ( PPO ) and DDPG
- Deep Deterministic Policy Gradient (DDPG)
- Intro To Proximal Policy Optimization
8

Introduction to RLHF and DPO
- What is RLHF
- What is DPO
9

Techniques and Algorithms for RLHF and DPO
- Implementing RLHF
- Implementing DPO
10

Hands-on
- Fine-Tuning a Large Language Model with RLHF
- Hands-on: Applying DPO to Text Generation