id: 6eaac087476a4719b7c200e81a51767c parent_id: 6760973cba81403a883bc2e1b159f292 item_type: 1 item_id: 6b9a556874c748f3b53f470c32c9243b item_updated_time: 1671391081722 title_diff: "[]" body_diff: "[{\"diffs\":[[0,\" 1 < n <\"],[1,\"=\"],[0,\" 4\\\np(n+1\"]],\"start1\":2938,\"start2\":2938,\"length1\":16,\"length2\":17},{\"diffs\":[[0,\"or n < 4\"],[1,\"\\\n\\\nbla bla slightly more complicated, easier to do on paper\\\n\\\ng) probably\\\n\\\nexercise 2\\\n\\\na)\\\n\\\nV + phi = max_a {r_a + P_a V}\\\n\\\nsolve using policy iteration\\\ntake a = (1,2) as start, for instance ;)\\\n\\\nsolve PE:\\\n\\\nV(1) + phi = 3 + 0.5V(1) + 0.5V(2)\\\nV(2) + phi = 0 + 0.5V(1) + 0.5V(2)\\\n\\\nlet V(2) = 0 for a change. then\\\n\\\nV(1) + phi = 3 + 0.5V(1)\\\nphi = 0.5V(1)\\\n\\\n1.5V(1) = 3 + 0.5V(1) => V(1) = 3 => phi = 1.5\\\n\\\n(V, phi) = ([3, 0], 1.5)\\\n\\\nnow let's try to find a better a\\\n\\\nonly a we can change is a(2)\\\n\\\na~(2) = argmax {1 + 0.5V(2), 0 + 0.5V(1) + 0.5V(2)} = argmax {1, 1.5} = 2\\\n\\\na~ = a, so we stop.\\\n\\\nb)\\\n\\\nstationary distribution eqs:\\\n\\\npi*(1) = 0.5pi*(1) + 0.5pi*(2)\\\npi*(2) = 0.5pi*(2) + 0.5pi*(1)\\\n\\\npi*(1) + pi*(2) = 1\\\n\\\npi*(1) = pi*(2) = 0.5\\\n\\\nthen, to get V*, we want a solution to the bellman eq s.t. = 0\\\n\\\nV* = [1.5, -1.5] succeeds; then V*(1) * pi*(1) + V*(2) * pi*(2) = 0.75 - 0.75 = 0\\\n\\\nexercise 3\\\n\\\n\"]],\"start1\":2968,\"start2\":2968,\"length1\":8,\"length2\":897}]" metadata_diff: {"new":{},"deleted":[]} encryption_cipher_text: encryption_applied: 0 updated_time: 2022-12-18T19:18:11.372Z created_time: 2022-12-18T19:18:11.372Z type_: 13