ICTS: Reinforcement Learning Bootcamp, Fall 2025 (Aug 4 - Aug 7)

Lecturer: Gaurav Mahajan (gaurav.mahajan@yale.edu)

Lecture Notes: link to pdf

Description

The course will cover the basics of reinforcement learning theory. We will start by implementing simple gradient-based algorithms in PyTorch and using them to solve standard control problems like CartPole and the Atari 2600 game Pong. Along the way, we will explore how to optimize both the sample complexity (the number of interactions with the environment) and the computational complexity (GPU hours) needed to learn an optimal policy.

Lectures

Day 1: Basics of Reinforcement Learning (notes)

Exploration vs Exploitation, and Credit Assignment
Markov Decision Process, Value Functions

Day 2: Policy Gradient Methods (notes)

Setup Instructions

Step 1: Create a virtual environment

python3 -m venv .venv
source .venv/bin/activate    # on Linux/macOS
.venv\Scripts\activate.bat   # on Windows

Step 2: Install required packages

pip install --upgrade pip
pip install torch
pip install "gymnasium[classic-control]"

Step 3: Verify installation

python -c "import torch; print(torch.__version__)"
python -c "import gymnasium as gym; env = gym.make('CartPole-v1'); print(env)"

Environments: CartPole and Pong
Vanilla Policy Gradient Algorithm
Implementing in Python (cartpole.py)

Day 3: Data Efficient RL (notes)

Bellman Equations and Optimism
Algorithm
Optimization Constraint in Linear Form
Exploration: Bounding the Number of Rounds

Day 4: Computational Complexity (notes)

Complexity Problems
Linear Infinite-Horizon MDP
Linear Finite-Horizon MDP