AI 539: Machine Learning Challenges in the Real World

How does machine learning perform in the wild?

In this class, we will explore the challenges that machine learning systems face when they move from the laboratory into the real world.

We will be inspired by machine learning applied to problems from astronomy, planetary science, autonomous driving, criminal justice, marketing, etc. Topics will include problem formulation, data collection/labeling, and evaluation techniques, and we will address thorny (but common) obstacles such as missing values, data that is not independently and identically distributed, concept/domain shift, explainability, and more.

You will have the opportunity to apply these concepts and strategies to a data set of your choice. Student work will include reading, implementation, experimentation, analysis of results, and communication of findings. If you're curious about how to solve real problems with machine learning, this is the class for you. Prior experience with supervised machine learning methods (CS 434, CS/AI 534, or instructor permission) is required. Experience with Python, scikit-learn, and classical machine learning methods like random forests will be beneficial.

Photo by J. Balla Photography on Unsplash

Instructor: Kiri Wagstaff

Teaching Assistant: Grace Diehl

Class meetings (Winter 2023):
Tuesdays and Thursdays, 12-1:20 p.m. (KEC 1001)

Credits: 3

Evaluation:

10% warm-up reading
10% in-class activities
30% try-it-out assignments
50% hands-on project

Syllabus (PDF)

Comments from previous students:

"This course provided helpful techniques that I can use in the real world to handle problems that I would have otherwise not known how to tackle. This course has helped me move forward a lot on my research."
"Lecture & class activities were actively engaging and used a lot of digestible real-world examples."
"The course material is novel and valuable. It touches on many aspects in the field which are not adequately covered in other courses. Prof. Wagstaff gave timely, helpful, and supportive feedback and did a great job at making the course inclusive, interactive, and rewarding."

Schedule:

Date	Topics
Jan. 10	What you will get out of this class Examples of ML gone wrong
Getting to know your data
Jan. 12	What's in your data? Data set profiling What to do when your data has holes (missing values)
Jan. 17	The tyranny of the majority: what to do about class imbalance
Jan. 19	Is your data set representative of its intended use? Detrimental (and beneficial) sampling bias
Jan. 24	Algorithm and data bias
Getting to know your model
Jan. 26	Would you use your own classifier? Methods for informative performance evaluation
Jan. 31	What kind of errors matter most? Problem-specific evaluation
Data complexities
Feb. 2	What if your data has dependencies? (space, time, groups)
Feb. 7	The space-time continuum: structured data Guest speaker: Dr. Rebecca Hutchinson
Feb. 9	Guest speaker: Dr. Hannah Kerner
Feb. 14	"Change is inevitable; growth is optional." - John Maxwell Dealing with domain shift Noisy data, Noisy labels
Sending your model out into the world
Feb. 16	Label shift and covariate shift What have we learned so far?
Feb. 21	How can you keep things running? Deployment, maintenance, and trust
Feb. 23	Why did it do that? (explainability)
Feb. 28	When should you believe a prediction? Confidence, uncertainty, and calibration
Going beyond the standard setting
March 2	When to have a human in the loop The merits of active learning
March 7	The dark side: combating adversaries
March 9	Exploration and discovery (unsupervised learning)
March 14	Student project presentations Bonus topic: Continual learning
March 16	Student project presentations Bonus topic: Machine learning values