CSC 696H: Topics in Bandits and Reinforcement Learning Theory (Fall 2024)

Many modern applications such as e-commerce, robotics, healthcare, autonomous driving, can be viewed as sequential decision making problems: a learning agent learns to take a sequence of actions that maximizes its reward. Of these, bandit problems (and its contextual variant) concerns decision making with independent observations, where the learner only sees the rewards of its taken action; reinforcement learning (RL) generalizes bandits in taking into account the temporal dependencies of the observations, therefore allowing the learning of intelligent behavior that maximizes long-term returns. This course will study bandits and RL from a theoretical perspective: when and how can we design bandits and RL algorithms with provable guarantees? Specifically, we will look at recent advances in this area, such as principled exploration in bandits and RL, in tabular and function approximation settings; policy optimization in RL; offline RL. In the first part of this course, students will learn the necessary mathematical background (such as concentration inequalities, online learning, Markov Decision Processes, optimization tools) for the design and analysis of bandits and RL algorithms. In the second part of this course, each registered student will present a recent paper on bandit or RL theory.

Logistics info

Time and venue: Fridays 1:30-4:10pm, Gould-Simpson 701

D2L course webpage: lecture video recordings can be found at “UA Tools” -> “Panopto”

Gradscope page: enroll code: YRZR4Z

Syllabus

We will be using Piazza to make important announcements and do Q&As. Please self-enroll here. Note:

Instructor

Chicheng Zhang

Office: Gould-Simpson 720

Office Hours: TBD

Textbook

Main references:

Reinforcement Learning: Theory and Algorithms, by Alekh Agarwal, Nan Jiang, Sham Kakade, and Wen Sun (AJKS);

Bandit algorithms, by Tor Lattimore and Csaba Szepesvari (LS);

Mathematical Analysis of Learning Algorithms by Tong Zhang (TZ);

Bandits and RL by Akshay Krishnamurthy (AK);

Statistical Reinforcement Learning and Decision Making by Dylan Foster and Sasha Rakhlin (FR)

Additional references:

Introduction to Multi-Armed Bandits by Alex Slivkins

Reinforcement learning: an introduction by Richard Sutton and Andrew Barto

RL theory virtual seminars by Gergely Neu, Ciara Pike-Burke, and Csaba Szepesvari

Review for prerequisites

Notes for probability review and linear algebra review (from Stanford CS 229).

Some general guidance in approaching theory-related questions: Street Fighting Mathematics by Ryan O’Donnell.

See also The matrix cookbook, The Probability and Statistics Cookbook, and Calculus cheatsheet.

Scribing and LaTeX

We will be using the following scribe note LaTeX template file and style file. See also Prof. Rob Schapire’s suggestions on preparing scribe notes. Please sign up for one scribing slot at the sign up sheet.

Some useful LaTeX resources: Learn LaTeX in 30 minutes by Overleaf; Introduction to LATEX by MIT Research Science Institute

Presentation

All registered students will be asked to give an in-class presentation on a Bandits / RL theory paper of their choice; please sign up for one presentation slot at the sign up sheet. See the Presentation page for more details.

Reinforcement learning theory courses at other institutions

Courses offered at other institutions have good and diverse lecture materials; here are a few examples:

Bandits and RL by Alekh Agarwal and Alex Slivkins

Reinforcement Learning by Alessandro Lazaric

Foundations of Reinforcement Learning by Chi Jin

Theoretical Foundations of Reinforcement Learning by Csaba Szepesvari

Statistical Reinforcement Learning by Nan Jiang

Foundations of Reinforcement Learning by Wen Sun and Sham Kakade

Useful tutorials

COLT 2021 Tutorial: Statistical Foundations of Reinforcement Learning by Akshay Krishnamurthy and Wen Sun

AAAI 2020 and ALT 2019 Tutorials: Exploration-Exploitation in Reinforcement Learning by Ronan Fruit, Mohammad Ghavamzadeh, Alessandro Lazaric, and Matteo Pirotta