What do you notice about the population mean and the mean of the sampling?

A sampling distribution is a probability distribution of a statistic obtained from a larger number of samples drawn from a specific population. The sampling distribution of a given population is the distribution of frequencies of a range of different outcomes that could possibly occur for a statistic of a population.

In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements. A population can thus be said to be an aggregate observation of subjects grouped together by a common feature.

  • A sampling distribution is a probability distribution of a statistic that is obtained through repeated sampling of a specific population.
  • It describes a range of possible outcomes for a statistic, such as the mean or mode of some variable, of a population.
  • The majority of data analyzed by researchers are actually samples, not populations.

Understanding Sampling Distribution

A lot of data drawn and used by academicians, statisticians, researchers, marketers, analysts, etc. are actually samples, not populations. A sample is a subset of a population. For example, a medical researcher that wanted to compare the average weight of all babies born in North America from 1995 to 2005 to those born in South America within the same time period cannot draw the data for the entire population of over a million childbirths that occurred over the ten-year time frame within a reasonable amount of time. They will instead only use the weight of, say, 100 babies, in each continent to make a conclusion. The weight of 100 babies used is the sample and the average weight calculated is the sample mean.

Now suppose that instead of taking just one sample of 100 newborn weights from each continent, the medical researcher takes repeated random samples from the general population, and computes the sample mean for each sample group. So, for North America, they pull up data for 100 newborn weights recorded in the U.S., Canada, and Mexico as follows: four 100 samples from select hospitals in the U.S., five 70 samples from Canada, and three 150 records from Mexico, for a total of 1,200 weights of newborn babies grouped in 12 sets. They also collect a sample data of 100 birth weights from each of the 12 countries in South America.

Each sample has its own sample mean, and the distribution of the sample means is known as the sample distribution.

The average weight computed for each sample set is the sampling distribution of the mean. Not just the mean can be calculated from a sample. Other statistics, such as the standard deviation, variance, proportion, and range can be calculated from sample data. The standard deviation and variance measure the variability of the sampling distribution.

The number of observations in a population, the number of observations in a sample, and the procedure used to draw the sample sets determine the variability of a sampling distribution. The standard deviation of a sampling distribution is called the standard error. While the mean of a sampling distribution is equal to the mean of the population, the standard error depends on the standard deviation of the population, the size of the population, and the size of the sample.

Knowing how spread apart the mean of each of the sample sets are from each other and from the population mean will give an indication of how close the sample mean is to the population mean. The standard error of the sampling distribution decreases as the sample size increases.

Special Considerations

A population or one sample set of numbers will have a normal distribution. However, because a sampling distribution includes multiple sets of observations, it will not necessarily have a bell-curved shape.

Following our example, the population average weight of babies in North America and in South America has a normal distribution because some babies will be underweight (below the mean) or overweight (above the mean), with most babies falling in between (around the mean). If the average weight of newborns in North America is seven pounds, the sample mean weight in each of the 12 sets of sample observations recorded for North America will be close to seven pounds as well.

However, if you graph each of the averages calculated in each of the 1,200 sample groups, the resulting shape may result in a uniform distribution, but it is difficult to predict with certainty what the actual shape will turn out to be. The more samples the researcher uses from the population of over a million weight figures, the more the graph will start forming a normal distribution.

In Note 6.5 "Example 1" in Section 6.1 "The Mean and Standard Deviation of the Sample Mean" we constructed the probability distribution of the sample mean for samples of size two drawn from the population of four rowers. The probability distribution is:

x-152154156158160162164P(x-)116216316416316216116

Figure 6.1 "Distribution of a Population and a Sample Mean" shows a side-by-side comparison of a histogram for the original population and a histogram for this distribution. Whereas the distribution of the population is uniform, the sampling distribution of the mean has a shape approaching the shape of the familiar bell curve. This phenomenon of the sampling distribution of the mean taking on a bell shape even though the population distribution is not bell-shaped happens in general. Here is a somewhat more realistic example.

Figure 6.1 Distribution of a Population and a Sample Mean

What do you notice about the population mean and the mean of the sampling?

Suppose we take samples of size 1, 5, 10, or 20 from a population that consists entirely of the numbers 0 and 1, half the population 0, half 1, so that the population mean is 0.5. The sampling distributions are:

n = 1:

x-01P(x-)0.50.5

n = 5:

x-00.20.40.60.81P(x-)0.030.160.310.310.160.03

n = 10:

x-00.10.20.30.40.50.60.70.80.91P(x-)0.000.010.040.120.210.250.210.120.040.010.00

n = 20:

x-00.050.100.150.200.250.300.350.400.450.50P(x-)0.000.000.000.000.000.010.040.070.120.160.18 x-0.550.600.650.700.750.800.850.900.951P(x-)0.160.120.070.040.010.000.000.000.000.00

Histograms illustrating these distributions are shown in Figure 6.2 "Distributions of the Sample Mean".

Figure 6.2 Distributions of the Sample Mean

What do you notice about the population mean and the mean of the sampling?

As n increases the sampling distribution of X- evolves in an interesting way: the probabilities on the lower and the upper ends shrink and the probabilities in the middle become larger in relation to them. If we were to continue to increase n then the shape of the sampling distribution would become smoother and more bell-shaped.

What we are seeing in these examples does not depend on the particular population distributions involved. In general, one may start with any distribution and the sampling distribution of the sample mean will increasingly resemble the bell-shaped normal curve as the sample size increases. This is the content of the Central Limit Theorem.

The Central Limit Theorem

For samples of size 30 or more, the sample mean is approximately normally distributed, with mean μX-=μ and standard deviation σX-=σ/n, where n is the sample size. The larger the sample size, the better the approximation.

The Central Limit Theorem is illustrated for several common population distributions in Figure 6.3 "Distribution of Populations and Sample Means".

Figure 6.3 Distribution of Populations and Sample Means

What do you notice about the population mean and the mean of the sampling?

The dashed vertical lines in the figures locate the population mean. Regardless of the distribution of the population, as the sample size is increased the shape of the sampling distribution of the sample mean becomes increasingly bell-shaped, centered on the population mean. Typically by the time the sample size is 30 the distribution of the sample mean is practically the same as a normal distribution.

The importance of the Central Limit Theorem is that it allows us to make probability statements about the sample mean, specifically in relation to its value in comparison to the population mean, as we will see in the examples. But to use the result properly we must first realize that there are two separate random variables (and therefore two probability distributions) at play:

  1. X, the measurement of a single element selected at random from the population; the distribution of X is the distribution of the population, with mean the population mean μ and standard deviation the population standard deviation σ;
  2. X-, the mean of the measurements in a sample of size n; the distribution of X- is its sampling distribution, with mean μX-=μ and standard deviation σX-=σ/n.

Example 3

Let X- be the mean of a random sample of size 50 drawn from a population with mean 112 and standard deviation 40.

  1. Find the mean and standard deviation of X-.
  2. Find the probability that X- assumes a value between 110 and 114.
  3. Find the probability that X- assumes a value greater than 113.

Solution

  1. By the formulas in the previous section

    μX-=μ=112 and σX-=σn=4050=5.65685
  2. Since the sample size is at least 30, the Central Limit Theorem applies: X- is approximately normally distributed. We compute probabilities using Figure 12.2 "Cumulative Normal Probability" in the usual way, just being careful to use σX- and not σ when we standardize:

    P(110
  3. Similarly

    P(X->113)=P(Z>113−μX-σX-)=P(Z>113−1125.65685)=P(Z>0.18)=1−P(Z<0.18)=1−0.5714=0.4286

Note that if in Note 6.11 "Example 3" we had been asked to compute the probability that the value of a single randomly selected element of the population exceeds 113, that is, to compute the number P(X > 113), we would not have been able to do so, since we do not know the distribution of X, but only that its mean is 112 and its standard deviation is 40. By contrast we could compute P(X->113) even without complete knowledge of the distribution of X because the Central Limit Theorem guarantees that X- is approximately normal.

Example 4

The numerical population of grade point averages at a college has mean 2.61 and standard deviation 0.5. If a random sample of size 100 is taken from the population, what is the probability that the sample mean will be between 2.51 and 2.71?

What can you say about the population mean and the mean of the sampling distribution of mean?

If the population is normal to begin with then the sample mean also has a normal distribution, regardless of the sample size. For samples of any size drawn from a normally distributed population, the sample mean is normally distributed, with mean μX=μ and standard deviation σX=σ/√n, where n is the sample size.

What is the relationship between population and sample mean?

A population is the entire group that you want to draw conclusions about. A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population. In research, a population doesn't always refer to people.

What is the mean of a sampling distribution of sample means if a population has a mean of?

The mean of the sampling distribution of the mean is the mean of the population from which the scores were sampled. Therefore, if a population has a mean μ, then the mean of the sampling distribution of the mean is also μ.