CSC 696H: Topics in Bandits and Reinforcement Learning Theory (Fall 2024)
Many modern applications such as e-commerce, robotics, healthcare, autonomous driving, can be viewed as sequential decision making problems: a learning agent learns to take a sequence of actions that maximizes its reward. Of these, bandit problems (and its contextual variant) concerns decision making with independent observations, where the learner only sees the rewards of its taken action; reinforcement learning (RL) generalizes bandits in taking into account the temporal dependencies of the observations, therefore allowing the learning of intelligent behavior that maximizes long-term returns. This course will study bandits and RL from a theoretical perspective: when and how can we design bandits and RL algorithms with provable guarantees? Specifically, we will look at recent advances in this area, such as principled exploration in bandits and RL, in tabular and function approximation settings; policy optimization in RL; offline RL. In the first part of this course, students will learn the necessary mathematical background (such as concentration inequalities, online learning, Markov Decision Processes, optimization tools) for the design and analysis of bandits and RL algorithms. In the second part of this course, each registered student will present a recent paper on bandit or RL theory.
Logistics info
Time and venue: Fridays 1:30-4:10pm, Gould-Simpson 701
D2L course webpage: lecture video recordings can be found at “UA Tools” -> “Panopto”
Gradscope page: enroll code: YRZR4Z
We will be using Piazza to make important announcements and do Q&As. Please self-enroll here. Note:
- If you have technical questions, try posing your questions as general as possible to promote discussions.
- If you have private questions, try making a private Piazza post (sending me an email may delay my responses).
Instructor
Office: Gould-Simpson 720
Office Hours: TBD
Textbook
Main references:
Reinforcement Learning: Theory and Algorithms, by Alekh Agarwal, Nan Jiang, Sham Kakade, and Wen Sun (AJKS);
Bandit algorithms, by Tor Lattimore and Csaba Szepesvari (LS);
Mathematical Analysis of Learning Algorithms by Tong Zhang (TZ);
Bandits and RL by Akshay Krishnamurthy (AK);
Statistical Reinforcement Learning and Decision Making by Dylan Foster and Sasha Rakhlin (FR)
Additional references:
Introduction to Multi-Armed Bandits by Alex Slivkins
Reinforcement learning: an introduction by Richard Sutton and Andrew Barto
RL theory virtual seminars by Gergely Neu, Ciara Pike-Burke, and Csaba Szepesvari
Review for prerequisites
Notes for probability review and linear algebra review (from Stanford CS 229).
Some general guidance in approaching theory-related questions: Street Fighting Mathematics by Ryan O’Donnell.
See also The matrix cookbook, The Probability and Statistics Cookbook, and Calculus cheatsheet.
Scribing and LaTeX
We will be using the following scribe note LaTeX template file and style file. See also Prof. Rob Schapire’s suggestions on preparing scribe notes. Please sign up for one scribing slot at the sign up sheet.
Some useful LaTeX resources: Learn LaTeX in 30 minutes by Overleaf; Introduction to LATEX by MIT Research Science Institute
Presentation
All registered students will be asked to give an in-class presentation on a Bandits / RL theory paper of their choice; please sign up for one presentation slot at the sign up sheet. See the Presentation page for more details.
Reinforcement learning theory courses at other institutions
Courses offered at other institutions have good and diverse lecture materials; here are a few examples:
Bandits and RL by Alekh Agarwal and Alex Slivkins
Reinforcement Learning by Alessandro Lazaric
Foundations of Reinforcement Learning by Chi Jin
Theoretical Foundations of Reinforcement Learning by Csaba Szepesvari
Statistical Reinforcement Learning by Nan Jiang
Foundations of Reinforcement Learning by Wen Sun and Sham Kakade
Useful tutorials
COLT 2021 Tutorial: Statistical Foundations of Reinforcement Learning by Akshay Krishnamurthy and Wen Sun
AAAI 2020 and ALT 2019 Tutorials: Exploration-Exploitation in Reinforcement Learning by Ronan Fruit, Mohammad Ghavamzadeh, Alessandro Lazaric, and Matteo Pirotta