CSC 696H: Topics in Bandits and Reinforcement Learning Theory - Fall 2024
Tentative schedule
For a preview of lecture notes & upcoming topics, see last year’s schedule page.
Date | Topics | Notes | Additional readings | Homework |
---|---|---|---|---|
Aug 30 | Introduction, course mechanics; Basic probability tools | Intro Slides Scribe note by Chenyi Wang | AK lec. 1/24 | |
Sep 6 | Finish basic probability tools; generalization in supervised learning | Note 1 Scribe note by Junfeng Xu | ||
Sep 13 | Online learning basics: online to batch conversion and follow the leader algorithm | Note 2 Scribe note by Baoying Feng | FR Sec. 1.6-1.6.1 | HW1 |
Sep 20 | Online learning: exponential weight algorithm; | Scribe note by Rethvick Sriram Yugendra Babu | FR Sec. 1.6.2, FR Sec. 2.1-2.2 | |
Sep 27 | Multi-armed bandits: basic algorithms; optimism in the face of uncertainty (OFU) principle | Note 3 Scribe note by Aryan Pathare | FR Sec. 2.1-2.2, 2.3 | |
Oct 4 | Finish UCB analysis; Begin Stochastic Linear Bandits; LinUCB algorithm | Scribe note by Razvan Dumitru | FR Sec. 2.3, 3.2, LinUCB for news recommendation (Li et al, 2010) | |
Oct 11 | LinUCB analysis: Confidence Set Construction & regret analysis | Scribe note by Brandon Hall | HW2 | |
Oct 18 | Finish LinUCB analysis; MDPs: Planning and Control | Note 4 Scribe note by Tuan Nguyen & Sathvik Reddy Nookala | FR Sec 5.1, 5.3 | |
Oct 25 | MDPs: Bellman equations; Online Reinforcement Learning in MDPs | Scribe note by Chi-Heng Yang | ||
Nov 1 | Brandon Hall: POMDPs Chi-Heng Yang: Self-play preference optimization | |||
Nov 8 | Razvan-Gabriel Dumitru: Direct Nash Optimization Tuan Nguyen: Importance-weighted offline contextual bandits | |||
Nov 15 | Junfeng Xu: unknown Markov games Baoying Feng: Safe Learning from demonstrations for Constrained MDP | |||
Nov 22 | Sathvik Reddy Nookala: Constrained RL Rethvick Yugendra Babu: distributional RL | |||
Nov 29 | Thanksgiving recess | |||
Dec 6 | Aryan Pathare: optimal model-free RL Chenyi Wang: reward centering |