CSC 380: Principles of Data Science (Spring 2023)
This course introduces students to principles of data science that are necessary for computer scientists to make effective decisions in their professional careers. A number of computer science sub-disciplines now rely on data collection and analysis. For example, computer systems are now complicated enough that comparing the execution performance of two different programs becomes a statistical estimation problem rather than a deterministic computation. This course teaches students the basic principles of how to properly collect and process data sources in order to derive appropriate conclusions from them. The course has three main components: data analysis, machine learning, and a project where students apply the concepts discussed in class to a substantial open-ended problem.
Logistics info
Time and venue: TuTh 2-3:15pm, M. Pacheco ILC 130
Piazza link Access code: wildcats
Gradescope Entry code: BBRJBW (NB: Please make sure your gradescope email address is the same as the one you have on D2L.)
D2L course webpage: lecture video recordings will be at “UA Tools” -> “Zoom” (NB: Zoom links are for recordings only and are not for live-streaming lectures.)
We will be using Piazza to make important announcements and do Q&As. Some general rules:
- If you have technical questions, try posing your questions as general as possible, to promote discussions among the class.
- If you have private questions, generally please make a private Piazza post instead of sending an email - This will help facilitate our processings of your requests significantly.
Course staff
Instructors: Chicheng Zhang and Kyoungseok Jang; Emails: {chichengz, ksajks} at arizona.edu
Teaching assistants: Saiful Islam Salim, Yinan Li, and Sayyed Faraz Mohseni; Emails: {saifulislam, yinanli, mohseni} at arizona.edu
Office Hours:
Chicheng Zhang: Tuesdays 3:30-4:30pm, Gould-Simpson 720 (before Feb 28)
Kyoungseok Jang: Tuesdays 3:30-4:30pm, Gould-Simpson 732 (after Feb 28)
Saiful Islam Salim: Wednesdays 10-11am, Gould-Simpson 856
Tugay Bilgis: Thursday 10-11am, Gould-Simpson 942
Yinan Li: Mondays 12:45- 1:45pm, Gould-Simpson 856
Sayyed Faraz Mohseni: Fridays 12-1pm, Gould-Simpson 837
Textbook
There is no single designated textbook for this course. Much of the course materials and assigned readings will be based on the following books:
WJ: Watkins, J., “An Introduction to the Science of Statistics: From Theory to Implementation”
MK: Murphy, K. “Machine Learning: A Probabilistic Perspective.” MIT press, 2012 (accessible online via UA library)
WL: Wasserman, L. “All of Statistics: A Concise Course in Statistical Inference.” Springer, 2004 (accessible online via UA library)
Other useful resources
-
You should have no difficulty in Python programming.
-
Notes for probability review and linear algebra review from Stanford’s CS 229 course.
-
The matrix cookbook, The Probability and Statistics Cookbook, and Calculus cheatsheet (recommended by Prof. Kwang-Sung Jun).
-
You may find using LaTeX helpful in writing homeworks or reports. Some useful LaTeX resources: Learn LaTeX in 30 minutes by Overleaf; Introduction to LATEX by MIT Research Science Institute
Machine learning courses at UA
CSC 535 Probabilistic Graphical Models by Kobus Barnard
ISTA 457/INFO 557 Neural Networks by Steven Bethard
CSC 665 Online Learning and Multi-armed Bandits by Kwang-Sung Jun
INFO 521 Introduction to Machine Learning by Clayton Morrison
CSC 665 Advanced Topics in Probabilistic Graphical Models by Jason Pacheco
CSC 580 Principles of Machine Learning by Carlos Scheidegger
MIS 601 Statistical Foundations of Machine Learning by Junming Yin
MATH 574M Statistical Machine Learning by Helen Zhang
CSC 696H: Topics in Reinforcement Learning Theory by Chicheng Zhang