10.3 Infinite-Horizon Problems

In stochastic control theory and artificial intelligence research,
most problems considered to date do not specify a goal set.
Therefore, there are no associated termination actions. The task is
to develop a plan that minimizes the expected cost (or maximize
expected reward) over some number of stages. If the number of stages
is finite, then it is straightforward to apply the value iteration
method of Section 10.2.1. The adapted version of backward
value iteration simply terminates when the first stage is reached.
The problem becomes more challenging if the number of stages is
infinite. This is called an *infinite-horizon problem*.

The number of stages for the planning problems considered in Section
10.1 is also infinite; however, it was expected that if
the goal could be reached, termination would occur in a finite number
of iterations. If there is no termination condition, then the costs
tend to infinity. There are two alternative cost models that force
the costs to become finite. The *discounted cost model* shrinks
the per-stage costs as the stages extend into the future; this yields
a geometric series for the total cost that converges to a finite
value. The *average cost-per-stage model* divides the total cost
by the number of stages. This essentially normalizes the accumulating
cost, once again preventing its divergence to infinity. Some of the
computation methods of Section 10.2 can be adapted to
these models. This section formulates these two infinite-horizon cost
models and presents computational solutions.