The value-iteration method from Section 10.2.1 can be
applied without modification. In the first step, initialize
using (12.6). Using the notation for the new
problem, the dynamic programming recurrence, (10.39),
becomes
The main difficulty in evaluating (12.7) is to
determine the set
, over which the
maximization occurs. Suppose that a state-nature sensor mapping is
used, as defined in Section 11.1.1. From the I-state
, the action
is applied. This
yields a forward projection
. The set of all
possible observations is
Steven M LaValle 2020-08-14