CSC 696H Fall 2023

CSC 696H: Topics in Bandits and Reinforcement Learning Theory - Fall 2023

Tentative schedule

Date	Topics	Notes	Additional readings	Homework
Aug 22	Introduction, course mechanics	Slides	AK lec. 1/24
Aug 24	Basic probability tools: concentration of measure; Hoeffding’s inequality		AK lec. 1/24, My 588 lecture note
Aug 29	Bernstein’s inequality; Generalization in supervised learning; analysis of ERM	Note 1	AK lec. 1/24
Aug 31	Online learning basics; online to batch conversion		FR Sec. 1.6-1.6.1
Sep 5	Azuma’s inequality; Online learning algorithms: exponential weights		FR Sec. 1.6.2
Sep 7	Finish exponential weights; Begin multi-armed bandits	Note 2	FR Sec. 2
Sep 12	MAB algorithms: explore-then-commit, eps-greedy, UCB		FR Sec. 2.1-2.2, 2.3	HW1
Sep 14	Analysis of UCB; Stochastic linear bandits		FR Sec. 2.3, 3.2, LinUCB for news recommendation (Li et al, 2010)
Sep 19	OFUL: confidence set construction and validity
Sep 21	OFUL: upper confidence bound form and regret analysis	Note 3	AJKS Sec 6.2
Sep 26	Elliptical Potential Lemma; Bandits with general function approximation		RecSys 2020 tutorial on Bandits in Recommender Systems WWW 2023 tutorial - industry perspective on Bandit Feedback
Sep 28	OFU for Bandits with general function approximation; Eluder dimension	Note 4	FR Sec 3.1, The original Eluder dimension paper
Oct 3	Eluder dimension-based analysis of OFU; Contextual bandits beyond optimism		FR Sec 4.1
Oct 5	Failure of optimism; the SquareCB reduction		FR Sec. 3.3, 3.5, SquareCB paper Spanner IGW for large action spaces	HW2
Oct 10	Finish SquareCB analysis; Decision-Estimation Coefficient;	Note 5	FR Sec. 4.2, ICML 22 Tutorial on Bridging Learning and Decision Making, ``The Decision-Estimation Coefficient’’ by Paul Mineiro
Oct 12	Finish E2D analysis; Begin Posterior Sampling for CB
Oct 17	Thompson Sampling (TS); Insufficiency of TS in structured bandits		Sec 3 of Tutorial on TS Prop. 1 of FGTS paper, Chapelle and Li, Empirical analysis on TS, Osband and Van Roy, Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
Oct 19	Feel-good Thompson Sampling (FGTS) and analysis; begin RL		TZ Sec. 17.4, Tong Zhang’s RL theory seminar talk
Oct 24	RL basics: the MDP model	Note 6	FR Sec 5.1, 5.3
Oct 26	Planning in RL: visitation (occupancy) distributions and optimality of Markovian policies; Value function		FR Sec 5.2, Csaba Szepesvari’s lecture note
Oct 31	Value function cont’d; begin online learning in MDPs
Nov 2	Online learning in MDPs: simulation lemma (Bellman residual decomposition) and optimism lemma	Note 7	FR 5.4, 5.5
Nov 7	UCB-VI algorithm and analysis		FR 5.6
Nov 9	RL with general function approximation: Bellman rank		FR 7.3, ICML Talk on bilinear rank, Bellman rank paper, DQN for Atari Games
Nov 14	Student presentation by Winston Zeng: A sharp analysis of model-based reinforcement learning with self-play	slides
Nov 16	Student presentation by Rupal Jain: Contextual Combinatorial Cascading Bandits	slides
Nov 21	Student presentation by Aniket Panda: High-dimensional Sparse Linear Bandits	slides
Nov 23	Thanksgiving recess
Nov 28	Student presentation by Yanan Wang: Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits	slides
Nov 30	Student presentation: Arush Sharma	slides
Dec 5	Student presentation: Sopan Sarkar	slides
Dec 7	Student presentation: Sadaf Raoufi	slides