Chapter 7: Inference for Means

7. Chapter 7: Inference for Means#

In the last chapter, we encountered the population parameter \(\sigma\) when calculating the margin of error for confidence intervals and computing the \(z\) test statistic for hypothesis testing. However, in practice, this \(\sigma\) is usually unknown.

Addressing the Unknown \(\sigma\):

In this chapter, we address this issue by making a more realistic assumption-that we do not know the value of \(\sigma\)-and explore how we can still conduct inference under two key settings:
1. Single-Sample Inference:
- We have one sample and are interested in estimating the population mean \(\mu\).
- This scenario mirrors what we saw in the previous chapter but now incorporates the unknown \(\sigma\).
1. Two-Sample Inference:
- We have two independent samples, meaning we are dealing with two populations.
- Our goal is to analyze the difference between the two population means, denoted as \(\mu_1 - \mu_2\).

Real-World Applications:

Many real-world problems fit into one of these two frameworks. Whether comparing before-and-after treatment effects, different experimental conditions, or population differences, these two settings provide a foundation for statistical inference when the population standard deviation \(\sigma\) is unknown.

7.2. Comparing Two Means#

Now, let’s shift our focus to two samples from two distinct populations. This scenario arises frequently in many real-world applications where we wish to compare two population means.

Fortunately, under mild conditions, the difference between the two sample means, \(\bar{X}_1 - \bar{X}_2\), is approximately normal. This allows us to extend the statistical procedures we learned for the one-sample case to this new two-sample setting.

Key Insights:

Just as in the one-sample case, we have both \(z\) statistics and \(t\) statistics for two-sample inference.
When the population standard deviations \(\sigma_1\) and \(\sigma_2\) are unknown and unequal, we use the standard two-sample \(t\) procedure.
However, if the two populations are assumed to have equal standard deviations (\(\sigma_1 = \sigma_2\)), we can pool the variance to obtain a more efficient procedure, known as the pooled two-sample \(t\) test.

By leveraging these different approaches, we can increase the accuracy and efficiency of our statistical inference when comparing two population means.

Note

Goals and Setup

Objective: Compare the means of a response variable in two different groups (populations).
Examples:
- Comparing college students’ impressions from Wisconsin vs. Indiana.
- Testing two different diets and measuring their impact on blood pressure.
- Evaluating two incentive plans for debit card usage.
Key Conditions:
- Each group is viewed as a distinct population.
- We gather independent SRSs: one from each population.
- The responses in group 1 are independent from those in group 2.

Two-Sample z Statistic

Notation

Population 1: Mean \(\mu_1\), Standard Deviation \(\sigma_1\), Sample Size \(n_1\), Sample Mean \(\bar{x}_1\), Sample SD \(s_1\).
Population 2: Mean \(\mu_2\), Standard Deviation \(\sigma_2\), Sample Size \(n_2\), Sample Mean \(\bar{x}_2\), Sample SD \(s_2\).

We often compare \(\mu_1\) and \(\mu_2\) by examining the difference \(\mu_1 - \mu_2\). In practice, we estimate this using \(\bar{x}_1 - \bar{x}_2\).

Two-Sample z Statistic (Population SDs Known)

If \(\sigma_1\) and \(\sigma_2\) are known:

The sampling distribution of \((\bar{x}_1 - \bar{x}_2)\) is Normal if both populations are Normal (or \(n_1, n_2\) are large enough by the Central Limit Theorem).
Mean of \((\bar{x}_1 - \bar{x}_2)\) is \((\mu_1 - \mu_2)\).
Variance of \((\bar{x}_1 - \bar{x}_2)\) is \(\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}\), assuming independence between the two samples.

The two-sample \(z\) statistic is:

\[z \;=\; \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}.\]

Two-Sample t Procedures

When Population SDs Are Unknown: Two-Sample t Procedures

Typically, \(\sigma_1\) and \(\sigma_2\) are unknown in real-world settings.
We replace \(\sigma_1\) and \(\sigma_2\) by their sample estimates \(s_1\) and \(s_2\).
The resulting test or confidence interval is the two-sample t procedure.
As with the one-sample case, if the sample sizes are reasonably large (or if each group is approximately Normal), these t procedures are robust to moderate deviations from Normality.

We’ll derive the exact form of the two-sample t statistic, its degrees of freedom, and how to construct confidence intervals for \(\mu_1 - \mu_2\).

1. When Do We Use Two-Sample t Procedures?

Unknown \(\sigma_1\) and \(\sigma_2\): In real-world settings, population standard deviations are typically unknown.
Two Independent SRSs: We draw samples of sizes \(n_1\) and \(n_2\) from two populations with means \(\mu_1\) and \(\mu_2\), respectively.
Comparison of Means: We want to infer about \(\mu_1 - \mu_2\), either via hypothesis testing (e.g., \(H_0: \mu_1 = \mu_2\)) or a confidence interval (CI) for \(\mu_1 - \mu_2\).

By replacing \(\sigma_1\) and \(\sigma_2\) with their sample estimates (\(s_1\) and \(s_2\)), we obtain a t-based procedure instead of the (rarely used) two-sample \(z\) procedure.

2. The Two-Sample t Statistic

When \(\sigma_1\) and \(\sigma_2\) are unknown, the two-sample t statistic is

\[t \;=\; \frac{\bigl(\bar{x}_1 - \bar{x}_2\bigr) - \bigl(\mu_1 - \mu_2\bigr)}{\sqrt{\frac{s_1^2}{n_1} \;+\; \frac{s_2^2}{n_2}}}.\]

We typically test \(H_0: \mu_1 = \mu_2\) (or \(\mu_1 - \mu_2 = 0\)).
Under \(H_0\), \(\mu_1 - \mu_2 = 0\), so the statistic simplifies to

\[t \;=\; \frac{(\bar{x}_1 - \bar{x}_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}.\]

Degrees of Freedom (\(k\)):

Exact distribution is complicated.
Software commonly uses the Satterthwaite approximation, which yields a t distribution with \(k\) degrees of freedom (not necessarily an integer).
No software? A conservative approximation is \(k = \min(n_1-1,\; n_2-1)\).

Two-Sample t Confidence Interval

3. Two-Sample t Confidence Interval

A level \(C\) confidence interval for \(\mu_1 - \mu_2\) is:

\[(\bar{x}_1 - \bar{x}_2) \;\pm\; t^* \,\sqrt{\frac{s_1^2}{n_1} \;+\; \frac{s_2^2}{n_2}},\]

where \(t^*\) is the critical value from the \(t(k)\) distribution cutting off an area of \(\frac{1 - C}{2}\) in each tail. The degrees of freedom \(k\) is approximated as above.

Assumptions/Conditions:

Independent samples: The two samples must be independent of each other.
Approximate Normality: For small samples, data from each population should be (roughly) Normal. For large \(n_1, n_2\), the t procedures are robust.

4. Summary

Two-sample \(z\) test: Rarely used because it requires known \(\sigma_1\) and \(\sigma_2\).
Two-sample t: Replaces unknown \(\sigma_1, \sigma_2\) with \(s_1, s_2\).
- Hypothesis Tests: \(H_0: \mu_1 = \mu_2 \quad \text{vs.}\quad H_a: \mu_1 \neq \mu_2\) (or \(<\), \(>\)).
- Confidence Intervals: Estimate \(\mu_1 - \mu_2\) with an interval.
- Degrees of Freedom: Usually approximated by software.

These procedures allow you to draw inferences on the difference between two population means, even if you do not know the actual population standard deviations.

Two-Sample t Significance Test

1. Hypothesis and Test Statistic

Hypotheses: Typically, we test

\[ H_0: \mu_1 - \mu_2 = \Delta_0 \quad\text{vs.}\quad H_a: \mu_1 - \mu_2 \neq \Delta_0 \quad(\text{or } <, >).\]

A common case is \(\Delta_0 = 0\).
Two-Sample t Statistic:

\[t = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0} {\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}.\]

The degrees of freedom \(\,k\) are approximated (often by software using the Satterthwaite method).
p-Value: We use the t\((k)\) distribution to find the p-value or critical value for the test.

2. Robustness and Sample Size Guidelines

Just as with the one-sample t, these two-sample procedures are generally robust against moderate non-Normality.

Let \(n_1 + n_2 = n_{\text{total}}\).
If \(n_{\text{total}} < 15\), use t tests only if both samples look reasonably Normal (no major skew or outliers).
If \(15 \le n_{\text{total}} < 40\), the t test works well unless you see outliers or strong skewness.
If \(n_{\text{total}} \ge 40\), the t methods are quite robust, even for skewed data.

Equal Sample Sizes are especially recommended for better robustness and more accurate p-values.

3. Practical Tips

Choosing Labels: You can label whichever sample as “population 1” or “population 2.” It often makes the test statistic positive, avoiding confusion about negative values of \(t\).
- Important: This is not the same as changing from a two-sided to one-sided test after seeing the data, which would be improper.
Small Samples: With very small \(n_1\) and \(n_2\), we have limited power, and confidence intervals become wide. Even so, if an effect is large, it can still be detected with small samples, but proceed with caution when distributions are unknown or heavily skewed.

Pooled Two-Sample t Procedures

1. Key Assumption: Equal Standard Deviations

We assume both populations have the same (but unknown) standard deviation \(\sigma\).
If true, we can combine (or pool) the two sample variances into a single estimate \(s_p^2\) to improve efficiency.

2. The Pooled Variance and t Statistic

Pooled Variance:

\[s_p^2 = \frac{(n_1 - 1)\,s_1^2 + (n_2 - 1)\,s_2^2}{n_1 + n_2 - 2}.\]

Degrees of Freedom: \(\;n_1 + n_2 - 2\).
Pooled Two-Sample t Statistic for testing \(H_0: \mu_1 - \mu_2 = \Delta_0\):

\[t = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0} {s_p\,\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}.\]

The corresponding confidence interval for \(\mu_1 - \mu_2\) uses the same \(s_p\) and t-distribution with \(n_1 + n_2 - 2\) degrees of freedom.

3. Advantages and Caveats

Advantages:
- Higher degrees of freedom than the general (unequal-\(\sigma\)) two-sample t test, which may yield slightly narrower confidence intervals and smaller p-values if the equal-\(\sigma\) assumption holds true.
Risks:
- The condition “\(\sigma_1 = \sigma_2\)” can be difficult to verify in practice.
- If the population standard deviations differ substantially or the sample sizes are unbalanced, the pooled procedure can be misleading.
- In modern practice, the unequal variance (unpooled) t approach is often safer-most software defaults to it (Satterthwaite approximation), unless you explicitly request pooling.

4. Why Use a Pooled Estimator for \(\sigma^2\)?

The formula assigns weights proportional to the degrees of freedom in each sample, emphasizing the larger sample if \(n_1 \neq n_2\).
Benefits:
- Higher Degrees of Freedom: The pooled procedure uses \(n_1 + n_2 - 2\) df, often larger than the approximate df from the unequal-variance (Satterthwaite) method.
- Efficiency: By combining variance estimates, we can get a more precise measure of the common \(\sigma\) when \(\sigma_1 = \sigma_2\) is truly valid.
- Narrower Intervals: Tends to give slightly smaller standard errors (and thus narrower confidence intervals) if the equal-variance assumption holds.

7.3. Sample Size Calculations#

For all the statistical procdeures we have introduced so far, besides population stadndard devaition \(\sigma\) and sample standard deviation \(s\), we also see the sample size \(n\) in the formulas. In theory, this value is known to the researchers, but in practice, when we are planning a study, this is actually an important question, choosing the right number of sample size to make sure that the margin of error is less than a prespecified value \(m\) at a certain level of confidence. We know as sample size increase, our margin or error would decrease, however, increasing the sample size does not come free because we need to collect more data points. So that’ why we need to plan ahead to have large enough sample size to mmake sure we achieve \(m\).

We start with the formula for margin of error

For all the statistical procedures we have introduced so far, in addition to the population standard deviation \(\sigma\) and sample standard deviation \(s\), we also see the sample size \(n\) in the formulas.

In theory, the sample size \(n\) is known to researchers, but in practice, when planning a study, choosing the right sample size is an important decision. The goal is to ensure that the margin of error stays below a prespecified value \(m\) at a given confidence level.

Trade-off: Precision vs. Cost

As the sample size increases, the margin of error decreases, improving precision.
However, collecting more data comes with increased time, cost, and effort.
This is why planning ahead is crucial: we need a large enough sample size to achieve the desired margin of error \(m\), but not unnecessarily large to waste resources.

Starting with the Margin of Error Formula

To determine the required sample size, we begin with the formula for the margin of error, which depends on the confidence level and the variability in the population.

Note

1. Margin of Error for a Mean \(\mu\)

A one-sample t confidence interval has the margin of error:

\[m = t^* \cdot \frac{s}{\sqrt{n}},\]

where:

\(n\) is the sample size,
\(t^*\) is the t-critical value for the desired confidence level (depends on \(\text{df} = n - 1\)),
\(s\) is the sample standard deviation after collecting data, but we guess a value \(s_g\) beforehand when planning.

Goal: Choose \(n\) to achieve an expected margin of error \(\le m\).
Since we won’t know the actual \(s\) until we sample, we use our best guess \(s_g\) for calculations.

If previous studies or pilot data are available, we might estimate \(\sigma\) from those results.
In the absence of prior data, subject-matter expertise or known properties of the data can guide us.

A simple rule of thumb:

\[s_g = \frac{\text{range}}{4}.\]

Justification: For many roughly bell-shaped distributions, the range spans about 4 standard deviations (from roughly \(\mu - 2\sigma\) to \(\mu + 2\sigma\)), so dividing the range by 4 provides a crude estimate of \(\sigma\).

Although not exact, it can be a useful starting point for sample-size or margin-of-error calculations when more precise estimates of \(\sigma\) are unavailable.

Note

2. Iterative Search Approach

Because \(t^*\) itself depends on \(n\), an iterative method is often used:

Initial Approximation:

Replace \(t^*\) with the corresponding z-value (as if \(\sigma\) were known).
Solve

\[ n = \Bigl(\frac{z^*\,s_g}{m}\Bigr)^2\]
and round up to the nearest integer.

Refine Using t Distribution:

With the tentative \(n\), find the actual \(t^*\) for confidence level \(C\) and \(\text{df} = n-1\).
Check if \(m \ge t^* \cdot \frac{s_g}{\sqrt{n}}\).
If not met, increment \(n\) and repeat.

This process continues until the requirement \(m \ge t^*\frac{s_g}{\sqrt{n}}\) is satisfied.

Note

3. Two-Sample Case

To design for a two-sample t confidence interval for \(\mu_1 - \mu_2\):
- Often assume equal sample sizes \(n_1 = n_2 = n\) and similar standard deviations \(s_1 \approx s_2\).
- The margin of error for \(\bar{x}_1 - \bar{x}_2\) becomes
  
  \[ m = t^* \cdot s_g \,\sqrt{\frac{1}{n} + \frac{1}{n}} \;=\; t^*\cdot s_g\sqrt{\frac{2}{n}}.\]
- We again guess \(s_g\) and use an iterative method (now with \(\text{df} \approx 2(n-1)\)) if we need a more precise approach.

In addition to planning for the sample size \(n\), we often also need to consider the power of our hypothesis test. These two concepts are closely related. Looking at the list of factors that influence power, we see that the first three also impact the margin of error.

Significance Level \(\alpha\)
- A 5% test (\(\alpha=0.05\)) is more likely to reject \(H_0\) than a 1% test (\(\alpha=0.01\)) for the same true alternative, simply because less evidence is required to reject \(H_0\).
Population Standard Deviation \(\sigma\)
- More variability (larger \(\sigma\)) makes it harder to detect a true difference from \(H_0\).
- With less variability, the test statistic is more sensitive to departures from \(H_0\).
Sample Size \(n\)
- Larger \(n\) reduces the standard error, making it easier to detect a given difference.
- Power increases as \(n\) grows.
The Alternative (Effect Size)
- The farther the true parameter is from the null hypothesis value, the easier it is to reject \(H_0\).
- We often measure this distance as “effect size” = \(\frac{\text{departure from }H_0}{\sigma}\).

Before collecting data, researchers often choose:
1. \(\alpha\): The significance level (e.g., 5%).
2. Minimum Detectable Effect: The smallest departure from \(H_0\) that matters in practice.
3. Desired Power (e.g., 80% or 90%).
4. Estimate of \(\sigma\) (or \(\sigma\)s in two-sample scenarios).
Software is typically used to solve for the required sample size \(n\) that achieves the target power for detecting the specified effect size at level \(\alpha\).
Calculating Power
- Reject \(H_0\) Region: Determine the critical test statistic(s) that lead to rejection, based on \(\alpha\).
- Assume a Specific \(\mu\) (or Effect Size) Under \(H_a\): Find the probability that the test statistic falls in the rejection region, given this alternative is true.
- Mathematically, this requires a noncentral t distribution (or other distribution, depending on the test). Manually this is tedious, so we rely on statistical software.

7.4. Unbiasedness and Consistency of the Sample Standard Deviation#

Unbiasedness of \(s^2\)

Let \(X_1, X_2, \ldots, X_n\) be i.i.d. random variables with mean \(\mu\) and variance \(\sigma^2\).
Define the sample variance:

\[s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2, \quad\text{where}\quad \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i.\]

Key Result: \(\mathbb{E}[s^2] = \sigma^2\).
Here’s the sketch of the proof:

\[\begin{split}\begin{aligned} \mathbb{E}[s^2] &= \mathbb{E}\!\Bigl[\frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2\Bigr] \\[6pt] &= \frac{1}{n-1}\,\mathbb{E}\!\Bigl[\sum_{i=1}^n (X_i - \mu + \mu - \bar{X})^2\Bigr]. \end{aligned}\end{split}\]

One can expand \((X_i - \bar{X})^2\) into \(\Bigl[(X_i - \mu) - (\bar{X} - \mu)\Bigr]^2\) and use the fact that

\[\sum_{i=1}^n (X_i - \bar{X})^2 = \sum_{i=1}^n (X_i - \mu)^2 - n(\bar{X} - \mu)^2.\]

Taking expectations and using \(\mathbb{E}[(\bar{X} - \mu)^2] = \frac{\sigma^2}{n}\) yields

\[\mathbb{E}\Bigl[\sum_{i=1}^n (X_i - \bar{X})^2\Bigr] = \sum_{i=1}^n \mathbb{E}[(X_i - \mu)^2] - n\,\mathbb{E}[(\bar{X} - \mu)^2] = n\,\sigma^2 - n\left(\frac{\sigma^2}{n}\right) = (n-1)\,\sigma^2.\]

Hence,

\[\mathbb{E}[s^2] = \frac{1}{n-1} \,\mathbb{E}\Bigl[\sum_{i=1}^n (X_i - \bar{X})^2\Bigr] = \frac{1}{n-1} \cdot (n-1)\,\sigma^2 = \sigma^2.\]

Therefore, \(s^2\) is an unbiased estimator of \(\sigma^2\).

Consistency of \(s^2\) and \(s\) (Optional)

To show consistency, we want:

\[s^2 \xrightarrow{\;p\;} \sigma^2 \quad\text{(in probability)}, \quad \text{and} \quad s \xrightarrow{\;p\;} \sigma.\]

Consistency of \(s^2\)
- By the Strong Law of Large Numbers,
  
  \[\frac{1}{n}\sum_{i=1}^n (X_i - \mu)^2 \;\to\; \sigma^2 \quad\text{almost surely.}\]
- Observe that
  
  \[s^2 = \frac{n}{n-1} \cdot \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2 \approx \frac{n}{n-1} \cdot \frac{1}{n}\sum_{i=1}^n (X_i - \mu)^2,\]
  
  where the second step uses \(\bar{X} \approx \mu\) for large \(n\).
- Since \(\frac{n}{n-1} \to 1\) and \(\frac{1}{n}\sum_{i=1}^n (X_i - \mu)^2 \to \sigma^2\), it follows that \(s^2 \to \sigma^2\) in probability (or almost surely under mild conditions). Thus, \(s^2\) is a consistent estimator of \(\sigma^2\).
Consistency of \(s\)
- By the Continuous Mapping Theorem, if \(s^2 \xrightarrow{\;p\;} \sigma^2\) and the square-root function \(\sqrt{\cdot}\) is continuous for \(\sigma^2>0\), then
  
  \[s = \sqrt{s^2} \;\xrightarrow{\;p\;} \sqrt{\sigma^2} = \sigma.\]
- Hence, \(s\) is a consistent estimator of \(\sigma\).

Chapter 7: Inference for Means

Contents

7. Chapter 7: Inference for Means#

7.1. Inference for the Mean of a Population#

7.2. Comparing Two Means#

7.3. Sample Size Calculations#

7.4. Unbiasedness and Consistency of the Sample Standard Deviation#