15.2.1.2 The continuous case

Now consider adapting to the continuous case. Suppose $ X$ and $ U$ are both continuous, but discrete stages remain, and verify that (15.5) to (15.8) still hold true. Their present form can be used for any system that is approximated by discrete stages. Suppose that the discrete-time model of Section 14.2.2 is used to approximate a system $ {\dot x}=
f(x,u)$ on a state space $ X$ that is a smooth manifold. In that model, $ U$ was discretized to $ U_d$, but here it will be left in its original form. Let $ \Delta t$ represent the time discretization.

The HJB equation will be obtained by approximating (15.6) with the discrete-time model and letting $ \Delta t$ approach zero. The arguments here are very informal; see [95,570,912] for more details. Using discrete-time approximation, the dynamic programming recurrence is

$\displaystyle G^*(x) = \min_{u \in U(x)} \left\{ l_d(x,u) + G^*(x') \right\} ,$ (15.9)

in which $ l_d$ is a discrete-time approximation to the cost that accumulates over stage $ k$ and is given as

$\displaystyle l_d(x,u) \approx l(x,u) \Delta t .$ (15.10)

It is assumed that as $ \Delta t$ approaches zero, the total discretized cost converges to the integrated cost of the continuous-time formulation.

Using the linear part of a Taylor series expansion about $ x$, the term $ G^*(x')$ can be approximated as

$\displaystyle G^*(x') \approx G^*(x) + \sum_{i = 1}^n \frac{\partial G^*}{\partial x_i} f_i(x,u) \Delta t .$ (15.11)

This approximates $ G^*(x')$ by its tangent plane at $ x$. Substitution of (15.11) and (15.10) into (15.9) yields

$\displaystyle G^*(x) \approx \min_{u \in U(x)} \left\{ l(x,u)\, \Delta t + G^*(...
...\sum_{i = 1}^n \frac{\partial G^*}{\partial x_i} f_i(x,u) \,\Delta t \right\} .$ (15.12)

Subtracting $ G^*(x)$ from both sides of (15.12) yields

$\displaystyle \min_{u \in U(x)} \left\{ l(x,u)\,\Delta t + \sum_{i = 1}^n \frac{\partial G^*}{\partial x_i} f_i(x,u) \,\Delta t \right\} \approx 0 .$ (15.13)

Taking the limit as $ \Delta t$ approaches zero and then dividing by $ \Delta t$ yields the HJB equation:

$\displaystyle \min_{u \in U(x)} \left\{ l(x,u) + \sum_{i = 1}^n \frac{\partial G^*}{\partial x_i} f_i(x,u) \right\} = 0 .$ (15.14)

Compare the HJB equation to (15.6) for the discrete-time case. Both indicate how the cost changes when moving in the best direction. Substitution of $ u^*$ for the optimal action into (15.14) yields

$\displaystyle \sum_{i = 1}^n \frac{\partial G^*}{\partial x_i} f_i(x,u^*) = -l(x,u^*) .$ (15.15)

This is just the continuous-time version of (15.8). In the current setting, the left side indicates the derivative of the cost-to-go function along the direction obtained by applying the optimal action from $ x$.

The HJB equation, together with a boundary condition that specifies the final-stage cost, sufficiently characterizes the optimal solution to the planning problem. Since it is expressed over the whole state space, solutions to the HJB equation yield optimal feedback plans. Unfortunately, the HJB equation cannot be solved analytically in most settings. Therefore, numerical techniques, such as the value iteration method of Section 14.5, must be employed. There is, however, an important class of problems that can be directly solved using the HJB equation; see Section 15.2.2.

Steven M LaValle 2020-08-14