Testing the hypothesis

Unfortunately, the scientist is not able to perform the same experiment at the same time on all people. She must instead draw a small set of people from the population and make a determination about whether the hypothesis is true. Let the index refer to a particular chosen subject, and let be his or her response for the experiment; each subject's response is a dependent variable. Two statistics are important for combining information from the dependent variables: The mean,

$\displaystyle \hat{\mu}= \frac{1}{n} \sum_{j=1}^n y[j] ,$

(12.3)

which is simply the average of

over the subjects, and the variance, which is

$\displaystyle \hat{\sigma}^2 = \frac{1}{n} \sum_{j=1}^n (y[j] - \hat{\mu})^2 .$

(12.4)

The variance estimate (12.4) is considered to be a biased estimator for the ``true'' variance; therefore, Bessel's correction is sometimes applied, which places

into the denominator instead of

, resulting in an unbiased estimator.

**Figure 12.5:** Student's t distribution: (a) probability density function (pdf); (b) cumulative distribution function (cdf). In the figures, $\nu$ is called the *degrees of freedom*, and $\nu = n-1$ for the number of subjects . When $\nu$ is small, the pdf has larger tails than the normal distribution; however, in the limit as $\nu$ approaches $\infty$ , the Student t distribution converges to the normal distribution. (Figures by Wikipedia user skbkekas.)
$\begin{figure}\begin{center} \begin{tabular}{cc} \psfig{file=figs/studenttpdf.ps... ...nttcdf.ps,width=2.8truein} \\ (a) & (b) \end{tabular}\end{center} \end{figure}$

To test the hypothesis, Student's t-distribution (``Student'' was William Sealy Gosset) is widely used, which is a probability distribution that captures how the mean $\mu$ is distributed if subjects are chosen at random and their responses are averaged; see Figure 12.5. This assumes that the response for each individual is a normal distribution (called Gaussian distribution in engineering), which is the most basic and common probability distribution. It is fully characterized in terms of its mean $\mu$ and standard deviation $\sigma$ . The exact expressions for these distributions are not given here, but are widely available; see [125] and other books on mathematical statistics for these and many more.

The Student's t test [319] involves calculating the following:

$\displaystyle t = {\hat{\mu}_1 - \hat{\mu}_2 \over \hat{\sigma}_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} ,$

(12.5)

in which

$\displaystyle \hat{\sigma}_p = \sqrt{(n_1 - 1) \hat{\sigma}_1^2 + (n_2 - 1) \hat{\sigma}_2^2 \over n_1 + n_2 - 2}$

(12.6)

and

is the number of subjects who received treatment

. The subtractions by

and

in the expressions are due to Bessel's correction. Based on the value of

, the confidence $\alpha$ in the null hypothesis

is determined by looking in a table of the Student's t cdf (Figure 12.5(b)). Typically, $\alpha = 0.05$ or lower is sufficient to declare that

is true (corresponding to 95% confidence). Such tables are usually arranged so that for a given $\nu$ and $\alpha$ is, the minimum

value needed to confirm

with confidence $1-\alpha$ is presented. Note that if

is negative, then the effect that

has on

runs in the opposite direction, and

is applied to the table.

The binary outcome might not be satisfying enough. This is not a problem because difference in means, $\hat{\mu}_1 - \hat{\mu}_2$ , is an estimate of the amount of change that applying had in comparison to . This is called the average treatment effect. Thus, in addition to determining whether the is true via the t-test, we also obtain an estimate of how much it affects the outcome.

Student's t-test assumed that the variance within each group is identical. If it is not, then Welch's t-test is used [351]. Note that the variances were not given in advance in either case. They are estimated ``on the fly'' from the experimental data. Welch's t-test gives the same result as Student's t-test if the variances happen to be the same; therefore, when in doubt, it may be best to apply Welch's t-test. Many other tests can be used and are debated in particular contexts by scientists; see [125].

Steven M LaValle 2020-11-11