How does machine learning perform in the wild?
In this class, we will explore the challenges that machine
learning systems face when they move from the laboratory
into the real world.
We will be inspired by machine learning applied to problems
from astronomy, planetary science, autonomous driving,
criminal justice, marketing, etc. Topics will include
problem formulation, data collection/labeling, and
evaluation techniques, and we will address thorny (but
common) obstacles such as missing values, data that is not
independently and identically distributed, concept/domain
shift, explainability, and more.
You will have the
opportunity to apply these concepts and strategies to a data
set of your choice. Student work will include reading,
implementation, experimentation, analysis of results, and
communication of findings. If you're curious about how to
solve real problems with machine learning, this is the class
for you. Prior experience with supervised machine learning
methods (CS 434, CS/AI 534, or instructor permission) is
required. Experience with Python, scikit-learn, and
classical machine learning methods like random forests will
be beneficial.
Instructor:
Kiri Wagstaff
Teaching Assistant:
Grace Diehl
Class meetings (Winter 2023):
Tuesdays and Thursdays, 12-1:20 p.m. (KEC 1001)
Credits: 3
Evaluation:
- 10% warm-up reading
- 10% in-class activities
- 30% try-it-out assignments
- 50% hands-on project
Syllabus (PDF)
Comments from previous students:
- "This course provided helpful techniques that I can use in the real world to handle problems that I would have otherwise not known how to tackle. This course has helped me move forward a lot on my research."
- "Lecture & class activities were actively engaging and used a lot of digestible real-world examples."
- "The course material is novel and valuable. It touches on many aspects in the field which are not adequately covered in other courses. Prof. Wagstaff gave timely, helpful, and supportive feedback and did a great job at making the course inclusive, interactive, and rewarding."
Schedule:
Date | Topics |
Jan. 10 |
What you will get out of this class
Examples of ML gone wrong |
Getting to know your data |
Jan. 12 |
What's in your data? Data set profiling
What to do when your data has holes (missing values) |
Jan. 17 |
The tyranny of the majority: what to do about class imbalance |
Jan. 19 |
Is your data set representative of its intended use?
Detrimental (and beneficial) sampling bias |
Jan. 24 |
Algorithm and data bias |
Getting to know your model |
Jan. 26 |
Would you use your own classifier?
Methods for informative performance evaluation |
Jan. 31 |
What kind of errors matter most?
Problem-specific evaluation |
Data complexities |
Feb. 2 |
What if your data has dependencies? (space, time, groups) |
Feb. 7 |
The space-time continuum: structured data
Guest speaker: Dr. Rebecca Hutchinson |
Feb. 9 |
Guest speaker: Dr. Hannah Kerner |
Feb. 14 |
"Change is inevitable; growth is optional." - John Maxwell
Dealing with domain shift
Noisy data, Noisy labels
|
Sending your model out into the world |
Feb. 16 |
Label shift and covariate shift
What have we learned so far? |
Feb. 21 |
How can you keep things running?
Deployment, maintenance, and trust |
Feb. 23 |
Why did it do that? (explainability) |
Feb. 28 |
When should you believe a prediction?
Confidence, uncertainty, and calibration |
Going beyond the standard setting |
March 2 |
When to have a human in the loop
The merits of active learning |
March 7 |
The dark side: combating adversaries |
March 9 |
Exploration and discovery (unsupervised learning) |
March 14 |
Student project presentations
Bonus topic: Continual learning |
March 16 |
Student project presentations
Bonus topic: Machine learning values |