Solving the active localization problem

From the previous two examples, it should be clear how to compute nondeterministic I-states and therefore solve the passive localization problem on a grid. Now consider constructing a plan that solves the active localization problem. Imagine using a computer to help in this task. There are two general approaches:

Precomputed Plan: In this approach, a planning algorithm running on a computer accepts a map of the environment and computes an information-feedback plan that immediately indicates which action to take based on all possible I-states that could result (a derived I-space could be used). During execution, the actions are immediately determined from the stored, precomputed plan.
Lazy Plan: In this case the map is still given, but the appropriate action is computed just as it is needed during each stage of execution. The computer runs on-board of the robot and must compute which action to take based on the current I-state.
The issues are similar to those of the sampling-based roadmap in Section 5.6. If faster execution is desired, then the precomputed plan may be preferable. If it would consume too much time or space, then a lazy plan may be preferable.

Using either approach, it will be helpful to recall the formulation of Section 12.1.1, which considers $ {\cal I}_{ndet}$ as a new state space, $ {\vec{X}}$, in which state feedback can be used. Even though there are no nature sensing actions, the observations are not predictable because the state is generally unknown. This means that $ {\vec{\theta}}$ is unknown, and future new states, $ {\vec{x}}_{k+1}$, are unpredictable once $ {\vec{x}}_k$ and $ {\vec{u}}_k$ are given. A plan must therefore use feedback, which means that it needs information learned during execution to solve the problem. The state transition function $ {\vec{f}}$ on the new state space was illustrated for the localization problem in Examples 12.1 and 12.2. The initial state $ {\vec{x}}_I$ is the set of all original states. If there are no symmetries, the goal set $ {\vec{X}}_G$ is the set of all singleton subsets of $ X$; otherwise, it is the set of all smallest possible I-states that are reachable (this does not need to be constructed in advance). If desired, cost terms can be defined to produce an optimal planning problem. For example, $ {\vec{l}}({\vec{x}},{\vec{u}})
= 2$ if a motion occurs, or $ {\vec{l}}({\vec{x}},{\vec{u}}) = 1$ otherwise.

Consider the approach of precomputing a plan. The methods of Section 12.1.2 can generally be applied to compute a plan, $ \pi
: {\vec{X}}\rightarrow U$, that solves the localization problem from any initial nondeterministic I-state. The approach may be space-intensive because an action must be stored for every state in $ {\vec{X}}$. If there are $ n$ grid tiles, then $ \vert{\vec{X}}\vert = 2^n-1$. If the initial I-state is always $ X$, then it may be possible to restrict $ \pi $ to a much smaller portion of $ {\vec{X}}$. From any $ {\vec{x}}\in {\vec{X}}_G$, a search based on backprojections can be conducted. If the initial I-state is added to $ S$, then the partial plan will reliably localize the robot. Parts of $ {\vec{X}}$ for which $ \pi $ is not specified will never be reached and can therefore be ignored.

Now consider the lazy approach. An algorithm running on the robot can perform a kind of search by executing actions and seeing which I-states result. This leads to a directed graph over $ {\vec{X}}$ that is incrementally revealed through the robot's motions. The graph is directed because the information regarding the state generally improves. For example, once the robot knows its state (or symmetry class of states), it cannot return to an I-state that represents greater uncertainty. In many cases, the robot may get lucky during execution and localize itself using much less memory than would be required for a precomputed plan.

The robot needs to recognize that the same positions have been reached in different ways, to ensure a systematic search. Even though the robot does not necessarily know its position on the map, it can usually deduce whether it has been to some location previously. One way to achieve this is to assign $ (i,j)$ coordinates to the positions already visited. It starts with $ (0,0)$ assigned to the initial position. If $ {\rm F}$ is applied, then suppose that position $ (1,0)$ is reached, assuming the robot moves to a new grid cell. If $ {\rm R}$ is applied, then $ (0,1)$ is reached if the robot is not blocked. The point $ (2,1)$ may be reachable by $ ({\rm F},{\rm F},{\rm R})$ or $ ({\rm R},{\rm F},{\rm F})$. One way to interpret this is that a local coordinate frame in $ {\mathbb{R}}^2$ is attached to the robot's initial position. Let this be referred to as the odometric coordinates. The orientation between this coordinate frame and the map is not known in the beginning, but a transformation between the two can be computed if the robot is able to localize itself exactly.

A variety of search algorithms can now be defined by starting in the initial state $ {\vec{x}}_I$ and trying actions until a goal condition is satisfied (e.g., no smaller nondeterministic I-states are reachable). There is, however, a key difference between this search and the search conducted by the algorithms in Section 2.2.1. Previously, the search could continue from any state that has been explored previously without any additional cost. In the current setting, there are two issues:

  1. [] Reroute paths: Most search algorithms enable new states to be expanded from any previously considered states at any time. For the lazy approach, the robot must move to a state and apply an action to determine whether a new state can be reached. The robot is capable of returning to any previously considered state by using its odometric coordinates. This induces a cost that does not exist in the previous search problem. Rather than being able to jump from place to place in a search tree, the search is instead a long, continuous path that is traversed by the robot. Let the jump be referred to as a reroute path. This will become important in Section 12.3.2.
  2. [] Information improvement: The robot may not even be able to return to a previous nondeterministic I-state. For example, if the robot follows $ ({\rm F},{\rm F},{\rm R})$ and then tries to return to the same state using $ ({\rm B},{\rm L},{\rm F})$, it will indeed know that it returned to the same state, but the state remains unknown. It might be the case, however, that after executing $ ({\rm F},{\rm F},{\rm R})$, it was able to narrow down the possibilities for its current state. Upon returning using $ ({\rm B},{\rm L},{\rm F})$, the nondeterministic I-state will be different.
The implication of these issues is that the search algorithm should take into account the cost of moving the robot and that the search graph is directed. The second issue is really not a problem because even though the I-state may be different when returning to the same position, it will always be at least as good as the previous one. This means that if $ {\eta}_1$ and $ {\eta}_2$ are the original and later history I-states from the same position, it will always be true that $ X({\eta}_2) \subseteq X({\eta}_1)$. Information always improves in this version of the localization problem. Thus, while trying to return to a previous I-state, the robot will find an improved I-state.

Steven M LaValle 2020-08-14