# CS 461 Homework 5

## Due: Midnight, March 13, 2008

Please your answers to these questions in `<yourlastname>-hw5.txt`

(or
PDF).

## Part 1: Reinforcement Learning (50 points)

How does reinforcement learning

**differ**from supervised and unsupervised learning, specifically in terms of**what information they use to learn**?**(5 points)**What do we mean by the

**tradeoff**between exploration and exploitation?**(5 points)**What is a

**policy**?**(5 points)**Give a (simple) example from your own life that could be modeled as a reinforcement learning problem (like my example of riding a bike or driving a car, but pick something else)

**(5 points)**. Define it as an RL problem by listing the states**(5 points)**, actions**(5 points)**, and how you would define the rewards**(5 points)**.If a reinforcement learner uses a gamma (discount rate) value of 0.1,

**what impact**does this have on its learning?**(5 points)****What is**the Markov property of a Markov Decision Process, and**why**do we use Markov Decision Processes to model reinforcement learning?**(5 points)**What nonlinear function approximator does TD-Gammon use to estimate the value of a board configuration (the probability of winning)?

**(5 points)**

## Part 2: Ensemble Learning (50 points)

What does the

**No Free Lunch**theorem state?**(5 points)**Is this set of codewords expressed as an error-correcting code? How can you tell?

**(5 points)**Word 1: T, F, F, T

Word 2: F, T, T, T

Word 3: F, T, F, F

How does an error-correcting output code detect errors made by (single) individual base learners?

**(10 points)**Devise a W matrix to create an error-correcting output code for the following task. You want to train 4 binary classifiers to distinguish between 3 classes. Therefore, your W matrix should have 3 rows and 4 columns; each entry should be +1 or -1.

**(10 points)**Why do we want to use "unstable" learners when bagging?

**(5 points)**Explain what is meant by a "weak" learner.

**(5 points)**What is the main difference between bagging and boosting?

**(5 points)**Describe one idea, algorithm, or insight you gained in this class that you think can be helpful for you in future classes or work.

**(5 points)**

## What to turn in

Upload this file to CSNS under "Homework 5":

`<yourlastname>-hw5.txt`

(or`<yourlastname>-hw5.pdf`

if you prefer to submit in PDF format)

## Extra credit

If you wish to tackle some extra credit, you may do so here. You can earn up to 10 points to be applied to any of your homework assignments (but not to exceed 100 on any assignment). To receive these points, you must get at least a 70% on the main part of Homework 5, so finish the regular assignment before moving on to this part.

Read over this paper by Konidaris and Barto from 2006. Explain what is meant by 1) "knowledge transfer" and 2) "shaping". How do the authors propose to enable reinforcement learning agents to identify their own shaping functions?

**(5 points)**The Alpaydin book describes more ensemble methods than we were able to cover in class.

**In your own words**, explain how**stacked generalization**(Section 15.7) and**cascading**(Section 15.8) work.**(5 points)**

## What to turn in

Upload this file to CSNS under "Homework 5: Extra Credit":

`<yourlastname>-hw5-extra.txt`

(or`<yourlastname>-hw5-extra.pdf`

if you prefer to submit in PDF format)