CSC 588: Machine Learning Theory - Spring 2022
Tentative schedule
We will be following a schedule similar to last year’s; you can refer to the scribe / handwritten notes therein, if you would like to review / read in advance.
| Date | Topics | Notes | Additional readings | Homework |
|---|---|---|---|---|
| Jan 13 | Introduction, motivation, course mechanics | slides | SSBD App B.1 and B.2 | HW1 (calibration) |
| Jan 18 | The supervised learning pipeline; The PAC learning framework | Handwritten note 1 | SSBD Chap 2, Sec 3.1 | |
| Jan 20 | The consistency algorithm and its analysis; Agnostic PAC learning | Handwritten note 2 | SSBD Chap 2, Sec 3.1 | |
| Jan 25 | Agnostic PAC Learning; ERM analysis; begin Hoeffding’s Inequality. | Handwritten note 3, Scribe note by Renee Zhang | SSBD Chap 4, Sec B.4 | |
| Jan 27 | Finish Hoeffding’s Inequality; Bernstein’s Inequality; ERM analysis with finite hypothesis class | Handwritten note 4, Scribe note by Yanan Wang | SSBD Chap 4, Sec B.5 | |
| Feb 1 | finish ERM analysis with finite hypothesis classes; PAC Learning with infinite hypothesis classes; learning rectangles | Scribe note by Ryan Murphy | SSBD Sec 6.1, Exercise 2.3 | |
| Feb 3 | Begin VC theory: definitions and examples | Handwritten note 5 | SSBD Sec 6.2, 6.3 | HW2 |
| Feb 8 | VC dimension of homogeneous linear classifiers; Sauer’s Lemma | Handwritten note 6, | SSBD Sec 6.5 | |
| Feb 10 | Finish Sauer’s Lemma; VC dimension of composite hypothesis classes; back to uniform convergence | Scribe note by Jing Wu | SSBD Sec 6.6 | |
| Feb 15 | Application: Gilvenko-Cantelli Theorem; Proof of uniform convergence for VC classes; Rademacher complexity; Massart’s Lemma | Handwritten note 7 Scribe note by Yuanyuan Sun | SSBD Sec 6.5.2, 26.1, 28.1 | |
| Feb 17 | Proof of uniform convergence for VC classes cont’d: McDiarmid’s Inequality, Symmetrization Lemma | Handwritten note 8 Scribe note by Minhang Zhou | SSBD Sec 6.5.2, 26.1, 28.1 | |
| Feb 22 | Finish symmetrization lemma; Fundamental Theorem of Statistical Learning; Lower bounds for PAC learning | Handwritten note 9 | SSBD Sec 5.1 | |
| Feb 24 | Finish lower bounds for PAC learning; Algorithm / Model selection in supervised learning | Handwritten note 10 | SSBD Chap 7 | HW3 |
| Mar 1 | Bias-Complexity Tradeoff; Model selection in supervised learning: validation and SRM | SSBD Chap 7 | ||
| Mar 3 | Finish SRM analysis; AdaBoost and its training error analysis | slides Rob Schapire’s boosting tutorial Handwritten note 11 | SSBD Chap 10 | |
| Mar 8 | Spring Recess | |||
| Mar 10 | Spring Recess | |||
| Mar 15 | Finish boosting; Statistical learning with general losses: Rademacher complexity analysis | slides | SSBD 26.1, 26.2, 26.4 | |
| Mar 17 | Finish the proof of contraction inequality; Begin margin-based generalization error bounds | slides Handwritten note 12 Handwritten note 13 | SSBD 26.3, 15.1 | |
| Mar 22 | Finish margin bound; Regularized loss minization formulations | Handwritten note 14 | ||
| Mar 24 | Johnson-Lindenstrauss lemma and margins; Stability and generalization | slides Handwritten note 15 | SSBD Chap 13, J-L Notes by Sham Kakade and Greg Shakhnarovich | |
| Mar 29 | Finish Stability; Online learning: definitions and examples | Handwritten note 16 | O Chap 1, Haipeng Luo’s online learning lecture notes 1 | |
| Mar 31 | Online to batch conversion; Azuma’s inequality | Handwritten note 17 | O Chap 3 | |
| Apr 5 | Finish Azuma’s inequality; Online gradient descent; subgradients | O Chap 2 | ||
| Apr 7 | Online gradient descent and its regret analysis | Handwritten note 18 | O Chap 2 | HW4 |
| Apr 12 | Online mirror descent; Fenchel conjugate | Handwritten note 19 | O Thm 2.19, 5.2.1, 6.1-6.3 | |
| Apr 14 | Online mirror descent examples: p-norm, exponentiated gradient; regret analysis | Handwritten note 20 | O 6.4.1, 6.6, 6.7, 6.4 | |
| Apr 19 | Weather forecasting (expert problem) revisited; unknown horizon; lower bounds; Efficient empirical loss minimization | Handwritten note 21 | O 6.8, 5.1, SSBD 14.5.1 | |
| Apr 21 | Efficient empirical loss minimization cont’d; OCO with strongly-convex functions; Begin multi-armed bandits | Handwritten note 22 | SSBD 14.4.4, 14.5.3, AJKS Chap 6.1, LS Chap 7 | |
| Apr 26 | Multi-armed bandits and the UCB algorithm | Handwritten note 23 | AJKS Chap 6.1, LS Chap 7 | |
| Apr 28 | Episodic RL; the optimism approach to RL | Handwritten note 24 | AJKS Chap 7.1-7.3 | |
| May 3 | Project presentations: Jing Wu: An empirical evaluation on stochastic optimization algorithms - SGD and SDCA Minhang Zhou: Pure exploration in multi-armed bandit problems Ryan Murphy: Model-free online learning in episodic MDPs Yanan Wang: Fairness accuracy analysis in fair machine learning Yuanyuan Sun: Compressed sensing with RIP |
| Jan 19 | The PAC learning framework; the consistency algorithm | Scribe note (0119/) (Please feel free to make a copy of this as a starting point when scribing) | Note 2 | SSBD Chap 2, Sec 3.1 | | | Jan 21 | Analysis of the consistency algorithm; agnostic PAC learning; Hoeffding’s Inequality | | Note 3 | SSBD Chap 2, Sec 3.1 | | | Jan 26 | Proof of Hoeffding’s inequality; Bernstein’s inequality | Scribe note (0126/) by Brian Toner | Note 4 | SSBD B.4, B.5 | | | Jan 28 | Analysis of ERM; VC theory | Scribe note (0128/) by Alonso Granados | Note 5 | SSBD Chap 4, Sec 6.1, 6.2 | | | Feb 2 | VC dimension examples; Sauer’s Lemma | Scribe note (0202/) by Katherine Best | Note 6 | SSBD Sec 6.2 - 6.5.1 | HW1 | | Feb 4 | Proof of Sauer’s Lemma; VC dimension of composite hypothesis classes | Scribe note (0204/) by Marium Yousuf | Note 7 | SSBD Sec 6.2 - 6.5.1 | | | Feb 9 | Uniform convergence via Rademacher complexity | Scribe note (0209/) by Brady Gales | Note 8 | SSBD Sec 6.5.2, 26.1, 28.1 | | | Feb 11 | Proof of uniform convergence of VC classes | Scribe note (0211/) by Yinan Li | Note 9 | SSBD Sec 6.5.2, 26.1, 28.1 | | | Feb 16 | Lower bounds of PAC learning with VC classes; fundamental theorem of statistical learning | Scribe note (0216/) by Xiaolan Gu | Note 10 | SSBD Sec 5.1 | | | Feb 18 | Error decomposition in statistical learning; model selection | Scribe note (0218/) by Jie Bian | Note 11 | SSBD Chap 7 | | | Feb 23 | Finish SRM; Adaboost and its training error analysis | Scribe note (0223/) by Yichen Li | Note 12 | SSBD Chap 10 | | | Feb 25 | No class - Reading day | | | | | | Mar 2 | Weak learnability implies “linear separability” through minimax theorem; Margin-based generalization error bounds for AdaBoost and linear classification | Scribe note (0302/) by Ryan Sullivant | Note 13 | SSBD 26.1, 26.2, 26.4 | HW2 | | Mar 4 | Proof of margin-based generalization error bounds; Contraction inequality of Rademacher complexity; SVM formulations | Scribe note (0304/) by Ruby Abrams | Note 14 | SSBD 26.3 | | | Mar 9 | No class - Reading day | | | | | | Mar 11 | ell_2-norm-based margin bounds; Extensions of SVM; Regularized loss minimization formulations | Scribe note (0311/) by Shahriar Golchin | Note 15 | SSBD 26.3; Chap 15; Spectrally-normalized margin bounds for neural networks | | | | Mar 16 | Stability, strong convexity, and regularization | Scribe note (0316/) by Adrienne Kinney | Note 16 | SSBD Chap 13 | | | | Mar 18 | Stability-fitting tradeoff; online learning: definitions and examples | Scribe note (0318/) by Sarah Luca | Note 17 | SSBD Chap 13.4, O Chap 1, Haipeng Luo’s online learning lecture notes 1 (which this lecture is based heavily on) | | | Mar 23 | Online to batch conversion; Azuma’s Inequality; online gradient descent | Scribe note (0323/) by Sheldon Deeny | Note 18 | O Chap 3, Chap 2 | | | Mar 25 | Analysis of online gradient descent; Online mirror descent: basic definitions (class meeting cancelled; Pre-recorded lecture on Panopto) | Scribe note (0325/) by Caleb Dahlke | Note 19 | O Chap 2, Sec 6.1-6.3 | | | Mar 30 | Online mirror descent examples: p-norm, exponential weights; Fenchel conjugate | Scribe note (0330/) by Erik Wessel | Note 20 | O Theorem 2.19, 5.2.1, 6.4.1, 6.6, 6.7 | HW3 | | Apr 1 | Online mirror descent analysis; Online learning odds & ends: unknown time horizon, lower bounds, Follow the Regularized Leader | Scribe note (0401/) by Yao Zhao | Note 21 | O 6.4, 5.1, 7.1 | | | | Apr 6 | Online gradient descent for strongly convex functions; kernel methods | Scribe note (0406/) by Zisu Wang | Note 22 | SSBD 14.4.4, 14.5.3, 15.5, 16.2, 16.3 | | | Apr 8 | Finish kernel methods; online Newton step for exp-concave functions | Scribe note (0408/) by Robert Vacareanu | Note 23 | SSBD 16.3, O 7.9 | | | Apr 13 | Finish online Newton step; begin multi-armed bandits (MAB) | | Note 24 | LS Chap 4 | | Apr 15 | Explore-then-commit; Upper confidence bound (UCB) algorithm and analysis | Scribe note (0415/) by Jesse Friedbaum | Note 25 | LS Chap 6,7 | | | Apr 20 | Finish UCB analysis; Adversarial MAB; EXP3 algorithm | Scribe note (0420/) by Bohan Li | Note 26 | LS Chap 11 | HW4 | | | Apr 22 | Stochastic linear contextual bandits and the LinUCB/OFUL algorithm | Scribe note (0422/) by Dan Li | Note 27 | LS Chap 19, 20 | | | Apr 27 | Episodic MDPs, Optimistic Q-learning (based on the original Optimistic Q-Learning paper and Haipeng Luo’s lecture note) | | Note 28 (page 7 onwards will not appear in the exam) | Chi Jin’s RL theory course notes RL theory book by Alekh Agarwal, Nan Jiang, Sham Kakade, and Wen Sun | | | Apr 29 | Project presentation I | | | | | | May 4 | Project presentation II | | | | | )
|Nov 7 | Prediction with expert advice (1) | | SSBD Sec 21.2 | | |Nov 12 | Prediction with expert advice (2) | | | | |Nov 14 | Stochastic multi-armed bandits (1) | | LS Chaps 6, 7 | | |Nov 19 | Stochastic multi-armed bandits (2) | | | | |Nov 21 | Adversarial multi-armed bandits | | LS Chap 11 | | LS Chap 19 |Dec 3 | Stochastic linear bandits (2) | | LS Chap 20 | | |Aug 27 | Introduction, motivation, course mechanics | slides probability review | SSBD App B.1 and B.2 | HW0 | |Aug 29 | Concentration of measure | concentration of measure note 1 | SSBD App B.4 | | |Sep 3 | Chernoff bound for Bernoulli random variables, McDiarmid’s Inequality | concentration of measure note 2 | SSBD App B.3, Lemma 26.4 | HW0 due in class | |Sep 5 | The PAC learning framework, finite classes | PAC learning note | SSBD Chap 2, Sec 3.1 | | |Sep 10 | Agnostic PAC learning, infinite classes | | SSBD Sec 3.2-3.3, Chap 4, Sec 6.1 | | |Sep 12 | VC Theory, Sauer’s Lemma | VC Theory note | SSBD Sec 6.2 - 6.5.1 | | |Sep 17 | Rademacher complexity | Uniform convergence / Rademacher complexity note | SSBD Sec 6.5.2, 26.1, 28.1 | HW1 | |Sep 19 | Information-theoretic lower bounds of PAC learning | | SSBD 28.2.1 | | |Sep 24 | Le Cam’s method | | SSBD 28.2.1 | | |Sep 26 | Assouad’s method, fundamental theorem of statistical learning | Lower bound note | SSBD Sec 28.2.2, Sec 5.1 | HW2 | |Oct 1 | Support Vector Machine (SVM); the maximum margin hyperplane problem | | SSBD Chap 15, Secs 26.2, 26.3 | | |Oct 3 | Soft margin, nonlinear mapping, the dual of SVM | | SSBD Sec 15.4 | | |Oct 8 | Kernel trick | | SSBD Chap 16 | | |Oct 10 | Margin bounds, generalization bounds of SVM | | SSBD Sec 26.3 | | |Oct 15 | Proof of margin bounds via Rademacher complexity | SVM note | SSBD Secs 26.2, 26.4 | | |Oct 17 | Model selection, structural risk minimization | Model selection note | SSBD Chap 7, Modern bias-variance tradeoff | Midterm (take home) | |Oct 22 | Boosting: AdaBoost | | SSBD Chap 10 | | |Oct 24 | Boosting and margin bound; two styles of margin bounds | Boosting note | SSBD Sec 26.2, Boosted ResNet paper | | |Oct 29 | Intro to Online learning; Online classification | | SSBD Sec 21.1 | HW3 | |Oct 31 | Minimax analysis: Littlestone’s dimension; Expert problem | Online classification note | SSBD Sec 21.1.1 | | |Nov 5 | Cover’s Impossibility Result; Decision-theoretic online learning. | Handwritten notes | SSBD Sec 21.2 | | |Nov 7 | Hedge; Online to Batch conversion. | Online to batch conversion notes | SSBD Sec 21.2.1, Hedge paper; Loss range adaptivity | | |Nov 12 | Convexity, subgradients | | SSBD Sec 12.1, 14.2 | | |Nov 14 | Bregman divergence, norms, dual norms | | | | |Nov 19 | Lipschitzness, strong convexity and smoothness | | SSBD Sec 12.1, 13.3 | HW4 | |Nov 21 | Legendre-Fenchel conjugacy, online convex optimization, Follow the regularized leader (FTRL) | OCO note | Section 2.7, 2.3 of Shalev-Shwartz’s Survey | | |Nov 26 | Analysis of FTRL; FTRL Examples | | H Sec 5.3 | | |Nov 28 | No Class. Thanksgiving Holiday! | | | | |Dec 3 | Online linear classification with margins; FTRL with Adaptive regularization | | Section 3.3 of Shalev-Shwartz’s Survey; H Sec 5.6 | | |Dec 5 | Adagrad, online optimization of strongly convex functions | | H Sec 3.3 | | |Dec 10 | Final presentation | | | Topics to review | )
