CSC 696H Fall 2024

CSC 696H: Topics in Bandits and Reinforcement Learning Theory - Fall 2024

Tentative schedule

For a preview of lecture notes & upcoming topics, see last year’s schedule page.

Date	Topics	Notes	Additional readings	Homework
Aug 30	Introduction, course mechanics; Basic probability tools	Intro Slides Scribe note by Chenyi Wang	AK lec. 1/24
Sep 6	Finish basic probability tools; generalization in supervised learning	Note 1 Scribe note by Junfeng Xu
Sep 13	Online learning basics: online to batch conversion and follow the leader algorithm	Note 2 Scribe note by Baoying Feng	FR Sec. 1.6-1.6.1	HW1
Sep 20	Online learning: exponential weight algorithm;	Scribe note by Rethvick Sriram Yugendra Babu	FR Sec. 1.6.2, FR Sec. 2.1-2.2
Sep 27	Multi-armed bandits: basic algorithms; optimism in the face of uncertainty (OFU) principle	Note 3 Scribe note by Aryan Pathare	FR Sec. 2.1-2.2, 2.3
Oct 4	Finish UCB analysis; Begin Stochastic Linear Bandits; LinUCB algorithm	Scribe note by Razvan Dumitru	FR Sec. 2.3, 3.2, LinUCB for news recommendation (Li et al, 2010)
Oct 11	LinUCB analysis: Confidence Set Construction & regret analysis	Scribe note by Brandon Hall		HW2
Oct 18	Finish LinUCB analysis; MDPs: Planning and Control	Note 4 Scribe note by Tuan Nguyen & Sathvik Reddy Nookala	FR Sec 5.1, 5.3
Oct 25	MDPs: Bellman equations; Online Reinforcement Learning in MDPs	Scribe note by Chi-Heng Yang
Nov 1	Brandon Hall: POMDPs Chi-Heng Yang: Self-play preference optimization
Nov 8	Razvan-Gabriel Dumitru: Direct Nash Optimization Tuan Nguyen: Importance-weighted offline contextual bandits
Nov 15	Junfeng Xu: unknown Markov games Baoying Feng: Safe Learning from demonstrations for Constrained MDP
Nov 22	Sathvik Reddy Nookala: Constrained RL Rethvick Yugendra Babu: distributional RL
Nov 29	Thanksgiving recess
Dec 6	Chenyi Wang: reward centering Aryan Pathare: optimal model-free RL