15 Confidence Intervals & Sampling Distributions
SLIDE DECKS
Some of the material presented in this chapter will be discussed in class. It is your responsibility to ensure you cover all the concepts presented both in class and in this textbook.
What is a Confidence Interval?
A confidence interval is a statistical concept and a range of values used to estimate an unknown population parameter based on a sample from that population. It provides a range of plausible values for the parameter and a level of confidence that the true parameter falls within that range. In simpler terms, a confidence interval tells you how certain or confident you can be about the range of values within which a population parameter is likely to lie.
Here are the key components of a confidence interval:
- Sample Data: To construct a confidence interval, you start with a sample from the population of interest. This sample should be random or representative to ensure the validity of the confidence interval.
- Point Estimate: You calculate a point estimate of the population parameter using the sample data. This point estimate is typically the sample mean (for estimating a population mean) or sample proportion (for estimating a population proportion), but it can be other statistics depending on the parameter of interest.
- Margin of Error: The margin of error (or confidence interval width) is a critical component of a confidence interval. It is a measure of the uncertainty associated with the point estimate. The margin of error is usually determined by the sample size and the desired level of confidence. Typically it is calculated using the critical value of a distribution (based on your desired level of confidence) multiplied by the standard error of the parameter you are estimating.
- Level of Confidence: The level of confidence, often denoted as (1 – α), represents the probability that the true population parameter falls within the calculated confidence interval. Common levels of confidence include 95%, 99%, or 90%. The choice of the level of confidence affects the width of the interval.
The formula for constructing a confidence interval typically looks like this:
(Point Estimate – Margin of Error, Point Estimate + Margin of Error)
The margin of error is calculated based on the variability in the sample data and the chosen level of confidence. Commonly used critical values from the standard normal distribution (Z-distribution) or the t-distribution are used to determine the margin of error.
For example, if you have a sample mean of 50, a margin of error of 5, and a 95% confidence level, you would construct a confidence interval as follows (50-5, 50+5) = (45, 55). This means that you are 95% confident that the true population mean lies within the interval (45, 55). In other words, you are saying that if you were to take many random samples and construct confidence intervals in the same way, approximately 95% of those intervals would contain the true population mean.
What is a Sampling Distribution?
A sampling distribution is a theoretical probability distribution that represents the possible values of a sample statistic (e.g., mean, variance, proportion) when samples are repeatedly drawn from a population. In other words, it describes the distribution of sample statistics that you would obtain if you were to take many random samples from the same population and calculate a specific statistic for each sample.
Here are some key points about sampling distributions:
- Purpose: Sampling distributions are used to make inferences about population parameters based on sample statistics. They help us understand the variability and behaviour of sample statistics when sampling from a population. Another way to think about this is that they allow us to estimate and evaluate probabilities based on the group, instead of the individual.
- Sample Statistic: The specific sample statistic of interest depends on the research question. Common examples include the sample mean, sample variance, sample proportion, and sample correlation coefficient.
- Central Limit Theorem: One of the fundamental principles in statistics is the Central Limit Theorem (CLT). It states that, under certain conditions, the sampling distribution of the sample mean (or sum) approaches a normal distribution as the sample size increases, regardless of the shape of the underlying population distribution. This property is extremely useful because it allows us to make inferences about population means using the normal distribution. This is explored in the lab: Types of Data and Simulations.
- Shape and Characteristics: The shape and characteristics of a sampling distribution depend on both the population distribution and the sample size. When the sample size is sufficiently large (often around 30 or more for practical purposes), the sampling distribution of the sample mean becomes approximately normal, even if the population distribution is not.
- Standard Error: The standard error (SE) is a measure of the variability of the sample statistic in the sampling distribution. It quantifies how much a sample statistic is expected to vary from one sample to another. For the sample mean, the standard error is typically calculated as the population standard deviation divided by the square root of the sample size.
- Confidence Intervals and Hypothesis Testing: Sampling distributions play a crucial role in constructing confidence intervals and conducting hypothesis tests. Confidence intervals are based on the variability of the sample statistic in the sampling distribution, and hypothesis tests often involve comparing sample statistics to their expected values under the null hypothesis.
- Practical Implications: Understanding sampling distributions helps researchers and analysts make more informed decisions and draw valid conclusions from sample data. It also guides the interpretation of statistical tests and results.
Activity
You can explore confidence intervals and sampling distributions using the following R Scripts (in RStudio). You will see these scripts in class and Lab 4.