A simulation-based policy iteration algorithm can be derived using
Q-factors. Recall from Section 10.2.2 that methods are
needed to: 1) evaluate a given plan, , and 2) improve the plan
by selecting better actions. The plan evaluation previously involved
linear equation solving. Now any plan,
, can be evaluated
without even knowing
by using the methods of Section
10.4.2. Once
is computed reliably from every
, further simulation can be used to compute
for each
and
. This can be achieved by defining a version
of (10.99) that is constrained to
:
Steven M LaValle 2020-08-14