A simulation-based policy iteration algorithm can be derived using Q-factors. Recall from Section 10.2.2 that methods are needed to: 1) evaluate a given plan, , and 2) improve the plan by selecting better actions. The plan evaluation previously involved linear equation solving. Now any plan, , can be evaluated without even knowing by using the methods of Section 10.4.2. Once is computed reliably from every , further simulation can be used to compute for each and . This can be achieved by defining a version of (10.99) that is constrained to :
Steven M LaValle 2020-08-14