AI 539: Machine Learning Challenges in the Real World

How does machine learning perform in the wild?

In this class, we will explore the challenges that machine learning systems face when they move from the laboratory into the real world.

We will be inspired by machine learning applied to problems from astronomy, planetary science, autonomous driving, criminal justice, marketing, etc. Topics will include problem formulation, data collection/labeling, and evaluation techniques, and we will address thorny (but common) obstacles such as missing values, data that is not independently and identically distributed, concept/domain shift, explainability, and more.

You will have the opportunity to apply these concepts and strategies to a data set of your choice. Student work will include reading, implementation, experimentation, analysis of results, and communication of findings. If you're curious about how to solve real problems with machine learning, this is the class for you. Prior experience with supervised machine learning methods (CS 434, CS/AI 534, or instructor permission) is required. Experience with Python, scikit-learn, and classical machine learning methods like random forests will be beneficial.

Photo by J. Balla Photography on Unsplash

Instructor: Kiri Wagstaff

Teaching Assistant: Grace Diehl

Class meetings (Winter 2023):
Tuesdays and Thursdays, 12-1:20 p.m. (KEC 1001)

Credits: 3

Evaluation:

Syllabus (PDF)

Comments from previous students:

Schedule:

DateTopics
Jan. 10
  • What you will get out of this class
  • Examples of ML gone wrong
  • Getting to know your data
    Jan. 12
  • What's in your data? Data set profiling
  • What to do when your data has holes (missing values)
  • Jan. 17
  • The tyranny of the majority: what to do about class imbalance
  • Jan. 19
  • Is your data set representative of its intended use?
  • Detrimental (and beneficial) sampling bias
  • Jan. 24
  • Algorithm and data bias
  • Getting to know your model
    Jan. 26
  • Would you use your own classifier?
  • Methods for informative performance evaluation
  • Jan. 31
  • What kind of errors matter most?
  • Problem-specific evaluation
  • Data complexities
    Feb. 2
  • What if your data has dependencies? (space, time, groups)
  • Feb. 7
  • The space-time continuum: structured data
  • Guest speaker: Dr. Rebecca Hutchinson
  • Feb. 9
  • Guest speaker: Dr. Hannah Kerner
  • Feb. 14
  • "Change is inevitable; growth is optional." - John Maxwell
  • Dealing with domain shift
  • Noisy data, Noisy labels
  • Sending your model out into the world
    Feb. 16
  • Label shift and covariate shift
  • What have we learned so far?
  • Feb. 21
  • How can you keep things running?
  • Deployment, maintenance, and trust
  • Feb. 23
  • Why did it do that? (explainability)
  • Feb. 28
  • When should you believe a prediction?
  • Confidence, uncertainty, and calibration
  • Going beyond the standard setting
    March 2
  • When to have a human in the loop
  • The merits of active learning
  • March 7
  • The dark side: combating adversaries
  • March 9
  • Exploration and discovery (unsupervised learning)
  • March 14
  • Student project presentations
  • Bonus topic: Continual learning
  • March 16
  • Student project presentations
  • Bonus topic: Machine learning values