CS 461 Homework 5

Due: Midnight, March 19, 2009

Please your answers to these questions in <yourlastname>-hw5.txt (or PDF).

Part 1: Reinforcement Learning (50 points)

The goal of a supervised learner is to output the correct label (class for classification, real value for regression). What kind of output does a reinforcement learner produce? (5 points)
What is the difference between a value function and a policy? (5 points)
Give a (simple) example from your own life that could be modeled as a reinforcement learning problem (e.g., riding a bike or driving a car, but pick something else) (5 points).
Define your example from question 3 as an RL problem by listing the possible states (5 points), actions (5 points), and how you would define the rewards (5 points).
How did TD-Gammon train the neural network it used to estimate the value function V? That is, how did it get training examples for the neural network? (5 points)
If a reinforcement learner uses a gamma (discount rate) value that is large (near 1.0) what impact does this have on its learning? (5 points)
Give a specific example of an episodic task. (5 points)
When can you use model-based reinforcement learning (as opposed to TD-learning)? (5 points)

Part 2: Ensemble Learning (50 points)

What does the No Free Lunch theorem state? (5 points)
How does bagging re-use the training data? (5 points)
How does AdaBoost re-use the training data? (5 point)
What does it mean for a learner to be "unstable"? (5 points)
AdaBoost uses a weighted combination of the output of each learner in the ensemble. How are those weights determined? (5 points)
Why do we want to use "weak" learners when boosting? (5 points)
Weka: Go to the Explorer and load wine.arff. Select the "Classify" tab. Under Classifer, choose J48 (default options). Using 10-fold cross-validation, report the percent correctly classified instances and the confusion matrix for this data set. (5 points)
Weka: Under Classifier, select meta -> AdaBoostM1. Click on the classifier box and change the base classifier from DecisionStump to J48. Using 10-fold cross-validation, report the percent correctly classified instances and the confusion matrix for this data set. (5 points)
Weka: Which of the three wine classes received the largest benefit from using boosting, in terms of its true positive rate? (5 points)
Describe one idea, algorithm, or insight you gained in CS 461 that you think can be helpful for you in future classes or work. (5 points)

What to turn in

Upload this file to CSNS under "Homework 5":

<yourlastname>-hw5.txt (or <yourlastname>-hw5.pdf if you prefer to submit in PDF format)

Extra credit

If you wish to tackle some extra credit, you may do so here. You can earn up to 5 points to be applied to any of your homework assignments (but not to exceed 100 on any assignment). To receive these points, you must get at least a 70% on the main part of Homework 5, so finish the regular assignment before moving on to this part.

The Alpaydin book describes more ensemble methods than we were able to cover in class. In your own words, explain how stacked generalization (Section 15.7) and cascading (Section 15.8) work. (5 points)

What to turn in

Upload this file to CSNS under "Homework 5: Extra Credit":

<yourlastname>-hw5-extra.txt (or <yourlastname>-hw5-extra.pdf if you prefer to submit in PDF format)