Course curriculum

  • 1
    Introduction to Reinforcement Learning
    • Course Handouts
    • What is Reinforcement Learning
    • Key Concepts and Terminology
    • Quiz
  • 2
    Markov Decision Processes
    • Introduction to MDPs
    • Policy and Value Functions
    • Bellman Equation
    • Quiz
  • 3
    Dynamic Programming
    • Dynamic Programming
    • Policy Evaluation and Policy Iteration
    • Value Iteration
    • Value Iteration Implementation
    • Quiz
  • 4
    Monte Carlo and Temporal-Difference Methods
    • Monte Carlo Prediction
    • TD Prediction
    • Hands on: MC vs TD Implementation
    • Quiz
  • 5
    Model-Free Control
    • Generalized Policy Iteration
    • SARSA (State-Action-Reward-State-Action)
    • Q-Learning
    • Quiz
  • 6
    Introduction to Deep Reinforcement Learning Algorithm
    • Deep Learning Recap
    • Combining Deep Learning with RL
    • Deep Q-Network
  • 7
    Proximal Policy Optimization ( PPO ) and DDPG
    • Deep Deterministic Policy Gradient (DDPG)
    • Intro To Proximal Policy Optimization
  • 8
    Introduction to RLHF and DPO
    • What is RLHF
    • What is DPO
  • 9
    Techniques and Algorithms for RLHF and DPO
    • Implementing RLHF
    • Implementing DPO
  • 10
    Hands-on
    • Fine-Tuning a Large Language Model with RLHF
    • Hands-on: Applying DPO to Text Generation