CMSC 471 - Homework #6

Out 11/17/11, due 12/8/11

You are encouraged to work on this homework assignment in groups of up to three students. If you work in a group, you only need to turn in one shared solution. All students in the group will receive the same grade on the assignment. If you choose to work in a group, you must actually produce group solutions, not have each member work independently on one part of the assignment, then submit the collection of independent solutions. To acknowledge that you have worked as intended, you must include the following statement at the top of the submitted assignment:

"We, <students' names>, worked equally as a group on this assignment, and each of us fully understands and acknowledges the group solutions that we are submitting. We understand that we will receive a common grade for this assignment."

Please be sure to include all group members' names on the assignment!

I. Decision tree learning (25 pts.)

The following table gives a data set for deciding whether to play or cancel a ball game, depending on the weather conditions.

Outlook	Temp (F)	Humidity (%)	Windy?	Class
sunny	75	70	true	Play
sunny	80	90	true	Don't Play
sunny	85	85	false	Don't Play
sunny	72	95	false	Don't Play
sunny	69	70	false	Play
overcast	72	90	true	Play
overcast	83	78	false	Play
overcast	64	65	true	Play
overcast	81	75	false	Play
rain	71	80	true	Don't Play
rain	65	70	true	Don't Play
rain	75	80	false	Play
rain	68	80	false	Play
rain	70	96	false	Play

Information Gain (10 pts.) At the root node for a decision tree in this domain, what would the information gain associated with a split on the Outlook attribute? What would it be for a split at the root on the Humidity attribute? (Use a threshold of 75 for humidity (i.e., assume a binary split: humidity <= 75 / humidity > 75.)
Gain Ratios (10 pts.) Again at the root node, what is the gain ratio associated with the Outlook attribute? What is the gain ratio for the Humidity attribute at the root (using the same threshold as in (a))?
Decision Tree (5 pts.) Suppose you build a decision tree that splits on the Outlook attribute at the root node. How many children nodes are there are at the first level of the decision tree? Which branches require a further split in order to create leaf nodes with instances belonging to a single class? For each of these branches, which attribute can you split on to complete the decision tree building process at the next level (i.e., so that at level 2, there are only leaf nodes)? Draw the resulting decision tree, showing the decisions (class predictions) at the leaves.

II. Learning Bayes Nets (15 pts.)

Consider an arbitrary Bayesian network, a complete data set for that network, and the likelihood for the data set according to the network. Give a clear explanation (i.e., an informal proof, in words) of why the likelihood of the data cannot decrease if we add a new link to the network and recompute the maximum-likelihood parameter values.

III. MDPs and RL (30 pts)

The diagram below shows a gridworld domain in which the agent starts at the upper left location. The upper and lower rows are both "one-way streets," since only the actions shown by arrows are available.

Actions that attempt to move the agent into a wall (the outer borders, or the thick black wall between all but the leftmost cell of the top and bottom rows) leave the agent in the same state it was in with probability 1, and have reward -2. If the agent tries to move to the right from the upper right or lower right locations, with probability 1, it is teleported to the far left end of the corresponding row, with reward as marked (-10 and +20, respectively). All other actions have the expected effect (move up, down, left, or right) with probability .9, and leave the agent in the same state it was in with probability .1. These actions all have reward -1, except for the transitions that are marked in the upper left and lower left cells. (Note that the marked transitions only give the indicated reward if the action succeeds in moving the agent in that direction.)

(a) MDP (10 pts) Give the MDP for this domain only for the state transitions starting from each of the states in the top row, by filling in a state-action-state transition table (showing only the state transitions with non-zero probability). You should refer to each state by its row and column index, so the upper left state is [1,1] and the lower right state is [2,4].

To get you started, here are the first few lines of the table:

State s Action a New state s' p(s'|s,a) r(s,a,s')
[1,1] Up [1,1] 1.0 -2
[1,1] Right [1,1] 0.1 -1
[1,1] Right [1,2] 0.9 +20

(b) Value function (10 pts) Suppose the agent follows a randomized policy π (where each available action in any given state has equal probability) and uses a discount factor of γ=.85. Given the partial value function (V^π; U^π in Russell & Norvig's terminology) shown below, fill in the missing V^π values. Show and explain your work.

(c) Policy (10 pts) Given the value function V^π computed in (b), what new policy π' would policy iteration produce at the next iteration? Show your answer as a diagram (arrows on the grid) or as a state-action table.

IV. Multi-Agent Systems (30 pts)

Game Analysis (15 pts) For each of the three normal-form two-player games below, identify the Nash equilibria (if there are any). Explain why these strategy sets are the Nash equilibria of the game, or why no Nash equilibria exist if this is the case. Also indicate the social-welfare-maximizing strategy sets for each game. Given the Nash equilibria that you've identified, will rational players choose the social-welfare-maximizing strategy set?
(a) Iterated Prisoner's Dilemma (5 pts)

C D

C 3,3 0,5

D 5,0 1,1

(b) Rock-Paper-Scissors (5 pts)
Also called "Roshambo," each player chooses to present one of three objects: rock, paper, or scissors. Rock breaks (beats) scissors; paper covers (beats) rock; scissors cuts (beats) paper. Nobody wins (it's a tie) if both players pick the same object.

R P S

R 0,0 -1,+1 +1,-1

P +1,-1 0,0 -1,+1

R -1,+1 +1,-1 0,0

(c) Chicken (5 pts)
Two drivers are headed for a one-lane bridge. If they both swerve out of the way, they "tie" (nobody scores). If one swerves and the other drives straight on, the "chicken" loses a point and the gutsy driver gains a point. If neither swerves, they both lose big.

Straight Swerve

Straight -10,-10 +1,-1

Swerve -1,+1 0,0
Game Creation (15 points) Create your own game for two or more players by specifying a normal form table. Explain the game in English (you don't have to have a great story, but it shouldn't just consist of random payoffs).
State the Nash equilibria of your game and explain why they are the equilibria. Also indicate what the social welfare maximizing strategy sets are for your game. Will rational players maximize social welfare in your game?
Do you think that the "tit-for-tat" strategy, which has been used successfully with the IPD, would work well for a player in your game? Why or why not?

State s	Action a	New state s'	p(s'\|s,a)	r(s,a,s')
[1,1]	Up	[1,1]	1.0	-2
[1,1]	Right	[1,1]	0.1	-1
[1,1]	Right	[1,2]	0.9	+20

	C	D
C	3,3	0,5
D	5,0	1,1

	R	P	S
R	0,0	-1,+1	+1,-1
P	+1,-1	0,0	-1,+1
R	-1,+1	+1,-1	0,0

	Straight	Swerve
Straight	-10,-10	+1,-1
Swerve	-1,+1	0,0