Terminology

Before detailing the method further, some explanation of existing names seems required. Consider the term reinforcement learning. In machine learning, most decision-theoretic models are expressed in terms of reward instead of cost. Thus, the task is to make decisions or find plans that maximize a reward functional. Choosing good actions under this model appears to provide positive reinforcement in the form of a reward. Therefore, the term reinforcement is used. Using cost and minimization instead, some alternative names may be decision-theoretic learning or cost-based learning.

The term learning is associated with the problem because estimating the probability distribution $ P(\theta\vert x,u)$ or $ P(x'\vert x,u)$ is clearly a learning problem. However, it is important to remember that there is also the planning problem of computing cost-to-go functions (or reward-to-go functions) and determining a plan that optimizes the costs (or rewards). Therefore, the term reinforcement planning may be just as reasonable.

The general framework is referred to as neuro-dynamic programming in [97] because the formulation and resulting algorithms are based on dynamic programming. Most often, a variant of value iteration is obtained. The neuro part refers to a family of functions that can be used to approximate plans and cost-to-go values. This term is fairly specific, however, because other function families may be used. Furthermore, for some problems (e.g., over small, finite state spaces), the cost values and plans are represented without approximation.

The name simulation-based methods is used in [95], which is perhaps one of the most accurate names (when used in the context of dynamic programming). Thus, simulation-based dynamic programming or simulation-based planning nicely reflects the framework explained here. The term simulation comes from the fact that a Monte Carlo simulator is used to generate samples for which the required distributions are learned during planning. You are, of course, welcome to use your favorite name, but keep in mind that under all of the names, the idea remains the same. This will be helpful to remember if you intend to study related literature.

Steven M LaValle 2020-08-14