CS 461 Homework 1
Due: Midnight, January 15, 2009
Part 1: Join the mailing list (10 points)
Go to this URL:
and sign up for the course mailing list. Easy!
Part 2: Machine Learning in the real world (50 points)
Where do we find Machine Learning in use outside of this class? Given what we covered in Lecture 1, you should have a good idea of how to spot Machine Learning in action. Your goal for part 1 is to go out on the web and find a
- news article,
- press release,
- product advertisement, or
- other noteworthy website
that describes a system, game, application, etc. that uses Machine Learning in some key fashion.
Next, you should compose two paragraphs in legible, polished English (you will be graded on the quality of your writing; use spell-check and proofread carefully):
- A summary of the machine learning component of your discovery. What kind of machine learning is being used?
- Your opinion, thoughts, and assessment of the system. Does it sound like it actually works, or are you skeptical? (There's a lot of hype out there!) Is it something you yourself would use, or are there drawbacks you see?
Create a text file called
<yourlastname>-hw1-ml.txt (fill in your own
last name) that includes:
- Standard assignment header: your name, the class name and number (CS 461, Machine Learning), the quarter (Winter 2009), the name of the assignment (Homework 1), and the assignment due date (January 15, 2008). You should include a header in this format on everything you turn in.
- The URL of your chosen Machine Learning system, game, application, etc.
- The two paragraphs described above.
Note: do not copy text from your online source. This is a violation of academic integrity.
Part 3: Supervised Learning (40 points)
Place your answers to these questions in a file called
What is the difference between classification and regression?
Imagine that you want to train a classifier to automatically rate restaurants, from 1 to 5 stars (5 being the best). List three numeric features you could use to represent the restaurants for the classifier.
Describe a classification scenario in which false positives are much worse (more costly) than false negatives.
Is k-Nearest Neighbors a parametric method or a nonparametric method? What does "parametric" mean in this context?
Consider the following two-dimensional data set. The training data contains two classes of objects which are represented by "+" and "-". A test instance whose class is unknown is represented by a "?". If we apply the k-nearest neighbors algorithm with k=3, the test instance will be classified as positive. Identify all (if any) odd values of k, from 1 to 9, for which its classification would be different.
What to turn in
Upload these files to CSNS:
<yourlastname>-hw1-ml.pdfif you prefer to submit in PDF format)
In addition, email your response to part 1 (without the assignment header, just the URL and your two paragraphs) to the CS 461 mailing list:
Feel free to explore the links posted by other students and discuss which ones you think are most interesting.