CSC 696H: Topics in Bandits and Reinforcement Learning Theory - Fall 2024

Tentative schedule

For a preview of lecture notes & upcoming topics, see last year’s schedule page.

Date Topics Notes Additional readings Homework
Aug 30 Introduction, course mechanics; Basic probability tools Intro Slides Scribe note by Chenyi Wang AK lec. 1/24  
Sep 6 Finish basic probability tools; generalization in supervised learning Note 1 Scribe note by Junfeng Xu    
Sep 13 Online learning basics: online to batch conversion and follow the leader algorithm Note 2 Scribe note by Baoying Feng FR Sec. 1.6-1.6.1 HW1
Sep 20 Online learning: exponential weight algorithm; Scribe note by Rethvick Sriram Yugendra Babu FR Sec. 1.6.2, FR Sec. 2.1-2.2  
Sep 27 Multi-armed bandits: basic algorithms; optimism in the face of uncertainty (OFU) principle Note 3 Scribe note by Aryan Pathare FR Sec. 2.1-2.2, 2.3  
Oct 4 Finish UCB analysis; Begin Stochastic Linear Bandits; LinUCB algorithm Scribe note by Razvan Dumitru FR Sec. 2.3, 3.2, LinUCB for news recommendation (Li et al, 2010)  
Oct 11 LinUCB analysis: Confidence Set Construction & regret analysis Scribe note by Brandon Hall   HW2
Oct 18 Finish LinUCB analysis; MDPs: Planning and Control Note 4 Scribe note by Tuan Nguyen & Sathvik Reddy Nookala FR Sec 5.1, 5.3  
Oct 25 MDPs: Bellman equations; Online Reinforcement Learning in MDPs Scribe note by Chi-Heng Yang    
Nov 1 Brandon Hall: POMDPs Chi-Heng Yang: Self-play preference optimization      
Nov 8 Razvan-Gabriel Dumitru: Direct Nash Optimization Tuan Nguyen: Importance-weighted offline contextual bandits      
Nov 15 Junfeng Xu: unknown Markov games Baoying Feng: Safe Learning from demonstrations for Constrained MDP      
Nov 22 Sathvik Reddy Nookala: Constrained RL Rethvick Yugendra Babu: distributional RL      
Nov 29 Thanksgiving recess      
Dec 6 Aryan Pathare: optimal model-free RL Chenyi Wang: reward centering