9.2.4.2 Parameter estimation

Another important application of the decision-making framework of this section is parameter estimation [89,268]. In this case, nature selects a parameter, $ \theta \in \Theta$, and $ \Theta$ represents a parameter space. Through one or more independent trials, some observations are obtained. Each observation should ideally be a direct measurement of $ \Theta$, but imperfections in the measurement process distort the observation. Usually, $ \Theta
\subseteq Y$, and in many cases, $ Y = \Theta$. The robot action is to guess the parameter that was chosen by nature. Hence, $ U = \Theta$. In most applications, all of the spaces are continuous subsets of $ {\mathbb{R}}^n$. The cost function is designed to increase as the error, $ \Vert u
- \theta\Vert$, becomes larger.

Example 9..12 (Parameter Estimation)   Suppose that $ U = Y = \Theta = {\mathbb{R}}$. Nature therefore chooses a real-valued parameter, which is estimated. The cost of making a mistake is

$\displaystyle L(u,\theta) = (u-\theta)^2 .$ (9.35)

Suppose that a Bayesian approach is taken. The prior probability density $ p(\theta)$ is given as uniform over an interval $ [a,b]
\subset {\mathbb{R}}$. An observation is received, but it is noisy. The noise can be modeled as a second action of nature, as described in Section 9.2.3. This leads to a density $ p(y\vert\theta)$. Suppose that the noise is modeled with a Gaussian, which results in

$\displaystyle p(y\vert\theta) = \frac{1}{\sqrt{2 \pi \sigma^2}} \; e^{-(y-\theta)^2/2\sigma^2},$ (9.36)

in which the mean is $ \theta $ and the standard deviation is $ \sigma $.

The optimal parameter estimate based on $ y$ is obtained by selecting $ u
\in {\mathbb{R}}$ to minimize

\begin{displaymath}\begin{split}\int_{-\infty}^\infty L(u,\theta) p(\theta\vert y) d\theta, \end{split}\end{displaymath} (9.37)

in which

$\displaystyle p(\theta\vert y) = { p(y\vert\theta) p(\theta) \over p(y) },$ (9.38)

by Bayes' rule. The term $ p(y)$ does not depend on $ \theta $, and it can therefore be ignored in the optimization. Using the prior density, $ p(\theta) = 0$ outside of $ [a,b]$; hence, the domain of integration can be restricted to $ [a,b]$. The value of $ p(\theta) =
1/(b-a)$ is also a constant that can be ignored in the optimization. Using (9.36), this means that $ u$ is selected to optimize

\begin{displaymath}\begin{split}\int_a^b L(u,\theta) p(y\vert\theta) d\theta, \end{split}\end{displaymath} (9.39)

which can be expressed in terms of the standard error function, $ \operatorname{erf}(x)$ (the integral from 0 to a constant, of a Gaussian density over an interval).

If a sequence, $ y_1$, $ \ldots $, $ y_k$, of independent observations is obtained, then (9.39) is replaced by

\begin{displaymath}\begin{split}\int_a^b L(u,\theta) p(y_1\vert\theta) \cdots p(y_k\vert\theta) d\theta . \end{split}\end{displaymath} (9.40)

$ \blacksquare$

Steven M LaValle 2020-08-14