15.2.3 Pontryagin's Minimum Principle

Pontryagin's minimum principle15.5 is closely related to the HJB equation and provides conditions that an optimal trajectory must satisfy. Keep in mind, however, that the minimum principle provides necessary conditions, but not sufficient conditions, for optimality. In contrast, the HJB equation offered sufficient conditions. Using the minimum principle alone, one is often not able to conclude that a trajectory is optimal. In some cases, however, it is quite useful for finding candidate optimal trajectories. Any trajectory that fails to satisfy the minimum principle cannot be optimal.

To understand the minimum principle, we first return to the case of discrete planning. As mentioned previously, the minimum principle is essentially given by (15.7). This can be considered as a specialization of the HJB equation to the special case of applying the optimal action $ u^*$. This causes the $ \min$ to disappear, but along with it the global properties of the HJB equation also vanish. The minimum principle expresses conditions along the optimal trajectory, as opposed to the cost-to-go function over the whole state space. Therefore, it can at best assure local optimality in the space of possible trajectories.

The minimum principle for the continuous case is essentially given by (15.15), which is the continuous-time counterpart to (15.7). However, it is usually expressed in terms of adjoint variables and a Hamiltonian function, in the spirit of Hamiltonian mechanics from Section 13.4.4.

Let $ \lambda$ denote an $ n$-dimensional vector of adjoint variables, which are defined as

$\displaystyle \lambda_i = \frac{\partial G^*}{\partial x_i} .$ (15.25)

The Hamiltonian function is defined as

$\displaystyle H(x,u,\lambda) = l(x,u) + \sum_{i=1}^n \lambda_i f_i(x,u) ,$ (15.26)

which is exactly the expression inside of the $ \min$ of the HJB equation (15.14) after using the adjoint variable definition from (15.25). This can be compared to the Hamiltonian given by (13.192) in Section 13.4.4 ($ p$ from that context becomes $ \lambda$ here). The two are not exactly the same, but they both are motivated by the same basic principles.

Under the execution of the optimal action trajectory $ {\tilde{u}}^*$, the HJB equation implies that

$\displaystyle H(x(t),u^*(t),\lambda(t)) = 0$ (15.27)

for all $ t \geq 0$. This is just an alternative way to express (15.15). The fact that $ H$ remains constant appears very much like a conservation law, which was the basis of Hamiltonian mechanics in Section 13.4.4. The use of the Hamiltonian in the minimum principle is more general.

Using the HJB equation (15.14), the optimal action is given by

$\displaystyle u^*(t) = \operatornamewithlimits{argmin}_{u \in U(x)} \left\{ H(x(t),u(t),\lambda(t)) \right\}.$ (15.28)

In other words, the Hamiltonian is minimized precisely at $ u(t) =
u^*(t)$.

The missing piece of information so far is how $ \lambda$ evolves over time. It turns out that a system of the form

$\displaystyle {\dot \lambda}= g(x,\lambda,u^*)$ (15.29)

can be derived by differentiating the Hamiltonian (or, equivalently, the HJB equation) with respect to $ x$. This yields two coupled systems, $ {\dot x}= f(x,u^*)$ and (15.29). These can in fact be interpreted as a single system in a $ 2n$-dimensional phase space, in which each phase vector is $ (x,\lambda)$. This is analogous to the phase interpretation in Section 13.4.4 for Hamiltonian mechanics, which results in (13.198).

Remember that $ \lambda$ is defined in (15.25) just to keep track of the change in $ G^*$. It would be helpful to have an explicit form for (15.29). Suppose that $ u^*$ is selected by a feedback plan to yield $ u^* = \pi ^*(x)$. In this case, the Hamiltonian can be interpreted as a function of only $ x$ and $ \lambda$. Under this assumption, differentiating the Hamiltonian (15.26) with respect to $ x_i$ yields

$\displaystyle \frac{\partial l(x,\pi ^*(x))}{\partial x_i} + \sum_{j=1}^n \frac...
...*(x)) + \sum_{j=1}^n \lambda_j \frac{\partial f_j(x,\pi ^*(x))}{\partial x_i} .$ (15.30)

This validity of this differentiation requires a technical lemma that asserts that the derivatives of $ \pi (x)$ can be disregarded (see Lemma 3.3.1 of [95]). Also, it will be assumed that $ U$ is convex in the arguments that follow, even though there exist proofs of the minimum principle that do not require this.

The second term in (15.30) is actually $ {\dot \lambda}_i$, although it is hard to see at first. The total differential of $ \lambda_i$ with respect to the state is

$\displaystyle d\lambda_i = \sum_{j=1}^n \frac{\partial \lambda_i}{\partial x_j} dx_j .$ (15.31)

Dividing both sides by $ dt$ yields

$\displaystyle \frac{d\lambda_i}{dt} = \sum_{j=1}^n \frac{\partial \lambda_i}{\p...
...c{dx_j}{dt} = \sum_{j=1}^n \frac{\partial \lambda_i}{\partial x_j} {\dot x}_j .$ (15.32)

Each $ {\dot x}_j$ is given by the state transition equation: $ {\dot x}_j =
f_j(x,\pi ^*(x))$. Therefore,

$\displaystyle {\dot \lambda}_i = \frac{d\lambda_i}{dt} = \frac{d}{dt} \frac{\pa...
... x_i} = \sum_{j=1}^n \frac{\partial \lambda_i}{\partial x_j} f_j(x,\pi ^*(x)) .$ (15.33)

Substituting (15.33) into (15.30) and setting the equation to zero (because the Hamiltonian is zero along the optimal trajectory) yields

$\displaystyle \frac{\partial l(x,\pi ^*(x))}{\partial x_i} + {\dot \lambda}_i + \sum_{j=1}^n \lambda_j \frac{\partial f_j(x,\pi ^*(x))}{\partial x_i} = 0 .$ (15.34)

Solving for $ {\dot \lambda}_i$ yields

$\displaystyle {\dot \lambda}_i = - \frac{\partial l(x,\pi ^*(x))}{\partial x_i} - \sum_{j=1}^n \lambda_j \frac{\partial f_j(x,\pi ^*(x))} {\partial x_i} .$ (15.35)

Conveniently, this is the same as

$\displaystyle {\dot \lambda}_i = -\frac{\partial H}{\partial x_i},$ (15.36)

which yields the adjoint transition equation, as desired.

The transition equations given by $ {\dot x}=
f(x,u)$ and (15.36) specify the evolution of the system given by the minimum principle. These are analogous to Hamilton's equations (13.198), which were given in Section 13.4.4. The generalized momentum in that context becomes the adjoint variables here.

When applying the minimum principle, it is usually required to use the fact that the optimal action at all times must satisfy (15.28). Often, this is equivalently expressed as

$\displaystyle H(x(t),u^*(t),\lambda(t)) \leq H(x(t),u(t),\lambda(t)) ,$ (15.37)

which indicates that the Hamiltonian increases or remains the same whenever deviation from the optimal action occurs (the Hamiltonian cannot decrease).

Example 15..4 (Optimal Planning for the Double Integrator)   Recall the double integrator system from Example 13.3. Let $ {\ddot q}
= u$, $ {\cal C}= {\mathbb{R}}$, and $ U = [-1,1] \cup \{u_T\}$. Imagine a particle that moves in $ {\mathbb{R}}$. The action is a force in either direction and has at most unit magnitude. The state transition equation is $ {\dot x}_1 = x_2$ and $ {\dot x}_2 = u$, and $ X = {\mathbb{R}}^2$. The task is to perform optimal motion planning between any two states $ {x_{I}}, {x_{G}}\in X$. From a given initial state $ {x_{I}}$, a goal state $ {x_{G}}$ must be reached in minimum time. The cost functional is defined in this case as $ l(x,u) = 1$ for all $ x \in X$ and and $ u \in U$ such that $ u \not = u_T$.

Using (15.26), the Hamiltonian is defined as

$\displaystyle H(x,u,\lambda) = 1 + \lambda_1 x_2 + \lambda_2 u .$ (15.38)

The optimal action trajectory is obtained from (15.28) as

$\displaystyle u^*(t) = \operatornamewithlimits{argmin}_{u \in [-1,1]} \left\{ 1 + \lambda_1(t) x_2(t) + \lambda_2(t) u(t)\right\}.$ (15.39)

If $ \lambda_2(t) < 0$, then $ u^*(t) = 1$, and if $ \lambda_2(t) > 0$, then $ u^*(t) = -1$. Thus, the action may be assigned as $ u^*(t) =
-{\rm sgn}(\lambda_2(t))$, if $ \lambda_2(t) \not = 0$. Note that these two cases are the ``bangs'' of the bang-bang control from Section 14.6.3, and they are also the extremal actions used for the planning algorithm in Section 14.4.1. At the boundary case in which $ \lambda_2(t) = 0$, any action in $ [-1,1]$ may be chosen.

The only remaining task is to determine the values of the adjoint variables over time. The adjoint transition equation is obtained from (15.36) as $ {\dot \lambda}_1 = 0$ and $ {\dot \lambda}_2 =
-\lambda_1$. The solutions are $ \lambda_1(t) = c_1$ and $ \lambda_2(t)
= c_2 - c_1 t$, in which $ c_1$ and $ c_2$ are constants that can be determined at $ t=0$ from (15.38) and (15.39). The optimal action depends only on the sign of $ \lambda_2(t)$. Since its solution is the equation of a line, it can change signs at most once. Therefore, there are four possible kinds of solutions, depending on the particular $ {x_{I}}$ and $ {x_{G}}$:

  1. Pure acceleration, $ u^*(t) = 1$, is applied for all time.
  2. Pure deceleration, $ u^*(t) = -1$, is applied for all time.
  3. Pure acceleration is applied up to some time $ t'$ and is followed immediately by pure deceleration until the final time.
  4. Pure deceleration is applied up to some time $ t'$ followed immediately by pure acceleration until the final time.
For the last two cases, $ t'$ is often called the switching time, at which point a discontinuity in $ {\tilde{u}}^*$ occurs. These two are bang-bang solutions, which were described in Section 14.6.3. $ \blacksquare$

This was one of the simplest possible examples, and the optimal solution was easily found because the adjoint variables are linear functions of time. Section 15.3 covers optimal solutions for the Dubins car, the Reeds-Shepp car, and the differential drive, all of which can be established using the minimum principle combined with some geometric arguments. As systems become more complicated, such analysis is unfortunately too difficult. In these cases, sampling-based methods, such as those of Chapter 14, must be used to determine optimal trajectories.

One common complication is the existence of singular arcs along the solution trajectory. These correspond to a degeneracy in $ H$ with respect to $ u$ over some duration of time. This could be caused, for example, by having $ H$ independent of $ u$. In Example 15.4, $ H$ became independent of $ u$ when $ \lambda_2(t) = 0$; however, there was no singular arc because this could only occur for an instant of time. If the duration had been longer, then there would be an interval of time over which the optimal action could not be determined. In general, if the Hessian (recall definition from (8.48)) of $ H$ with respect to $ u$ is a positive definite matrix, then there are no singular arcs (this is often called the Legendre-Clebsch condition). The minimum principle in this case provides a sufficient condition for local optimality in the space of possible state trajectories. If the Hessian is not positive definite for some interval $ [t_1,t_2]$ with $ t_1 < t_2$, then additional information is needed to determine the optimal trajectory over the singular arc from $ x^*(t_1)$ to $ x^*(t_2)$.

Note that all of this analysis ignores the existence of obstacles. There is nothing to prevent the solutions from attempting to enter an obstacle region. The action set $ U(x)$ and cost $ l(x,u)$ can be adjusted to account for obstacles; however, determining an optimal solution from the minimum principle becomes virtually impossible, except in some special cases.

There are other ways to derive the minimum principle. Recall from Section 13.4.4 that Hamilton's equations can be derived from the Euler-Lagrange equation. It should not be surprising that the minimum principle can also be derived using variational principles [95,789]. The minimum principle can also be interpreted as a form of constrained optimization. This yields the interpretation of $ \lambda$ as Lagrange multipliers. A very illuminating reference for further study of the minimum principle is Pontryagin's original works [801].



Subsections
Steven M LaValle 2020-08-14