CSC 696H Fall 2021: Schedule

CSC 696H: Topics in Reinforcement Learning Theory - Fall 2021

Tentative schedule

Date	Topics	Notes / slides	Additional readings	Homework
8/24	Administrivia; Introduction to RL	slides	AJKS Section 1.1-1.2
8/26	MDP basics; Markov reward processes (MRPs); Bellman consistency equation	Scribe note 0826/ by Chicheng Zhang	AJKS Section 1.1.2
8/31	Bellman consistency equation and its interpretation; Optimal value functions	Scribe note 0831/ by Yinan Li	AJKS Section 1.1.3
9/2	Bellman optimality equation; Contraction mapping; begin planning in MDPs	Scribe note 0902/ by Brady Gales	AJKS Section 1.1.3; 1.4.1
9/7	Planning: value iteration; begin policy iteration	Scribe note 0907/ by Yangzi Lu	AJKS Section 1.4.1; 1.4.2
9/9	Planning: finish policy iteration; linear programming	Scribe note 0909/ by Yichen Li	AJKS Section 1.4.2; 1.5
9/14	Finite horizon episodic MDPs and planning; begin RL with a generative model	Scribe note 0914/ by Yao Zhao	AJKS Section 1.2
9/16	Sample-based value iteration and analysis; simulation lemma	Scribe note 0916/ by Zhiwu Guo	AJKS Section 2.1-2.2, Chi Jin’s lecture notes 5,6 (Optional) AJKS Section 2.3, Proof of Hoeffding’s Inequality	HW1
9/21	Q-learning for RL with a generative model	Scribe note 0921/ by Wenhan Zhang	Chi Jin’s lecture notes 6,7
9/23	Finish Q-learning; Begin online episodic RL; Multi-armed bandits (MAB); Explore-then-commit	Scribe note 0923/ by Robert Ferrando	AJKS Sections 7.1, 6.1
9/28	MAB algorithms: epsilon-greedy, UCB; Failure of epsilon-greedy in online episodic RL	Scribe note 0928/ by Ruoyao Wang	AJKS Sections 6.1, Optimistic Q-learning paper Appendix A
9/30	The UCB-VI algorithm and its analysis	Scribe note 0930/ by Zhengguang Zhang	AJKS Sections 7.2-7.4; Bernstein’s Inequality: My notes, Theorem 2.8.4 of Vershynin’s book; See also the proofs of AJKS Lemmas 7.3 and 7.8 for alternative proofs of model concentration using Azuma-Hoeffding / Azuma-Bernstein
10/5	Finish UCB-VI analysis; Begin RL with function approximation	Scribe note 1005/ by Hao Qin	AJKS Sections 7.2-7.4; Chapter 3
10/7	The LSVI algorithm; Linear Bellman completeness; Experiment design / active learning	Scribe note 1007/ by Bohan Li	AJKS Sections 3.1-3.3.1
10/12	Statistical guarantees of realizable linear regression; LSVI with G-optimal design	Scribe note 1012/ by Bao Do	AJKS Lemma A.9, Theorem A.10, Sections 3.3.2-3.3.3
10/14	Finish LSVI with G-optimal design; Performance difference lemma; begin online RL in linear MDPs	Scribe note 1014/ by Amir Yazdani	AJKS Section 1.5 (discounted setting), Section 3.3.3, Sections 8.1-8.2
10/19	Online RL in linear MDPs: LSVI-UCB and its analysis	Scribe note 1019/ by Yinan Li, Hao Qin, and Yichen Li	AJKS Sections 8.3-8.7, my draft notes, proof of elliptic potential lemma (page 7 onwards), LSVI-UCB paper	HW2
10/21	Wenhan Zhang: Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction	slides
10/26	Yinan Li: Is Q-learning Provably Efficient?	slides
10/28	Bohan Li: Near-optimal reinforcement learning with self-play	slides
11/2	Yichen Li: Toward the Fundamental Limits of Imitation Learning	slides
11/4	Zhengguang Zhang: On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP	slides
11/9	Ruoyao Wang: Provably efficient RL with Rich Observations via Latent State Decoding	slides
11/11	Yao Zhao: Near-Optimal Representation Learning for Linear Bandits and Linear RL (pre-recorded)	slides
11/16	Yangzi Lu: Reward-Free Exploration for Reinforcement Learning	slides
11/18	Bao Do: Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap	slides
11/23	Robert Ferrando: Adaptive Discretization for Model-Based Reinforcement Learning	slides
11/25	No class - thanksgiving
11/30	Zhiwu Guo: Online Learning in Unknown Markov Games	slides
12/2	Hao Qin: Nearly Minimax Optimal Reinfocement Learning for Linear Mixture Markov Decision Processes	slides
12/7	Amir Yazdani: Reinforcement Learning in Reward-Mixing MDPs	slides