Sample Size Calculator
Calculate the minimum sample size needed or find the margin of error for your survey
General Statistics Introduction
In statistics, information is often inferred about a population by studying a finite number of individuals from that population, i.e., the population is sampled, and it is assumed that characteristics of the sample are representative of the overall population.
For the following, it is assumed that there is a population of individuals where some proportion, p, of the population is distinguishable from the other 1-p in some way; e.g., p may be the proportion of individuals who have brown hair, while the remaining 1-p have black, blond, red, etc. Thus, to estimate p in the population, a sample of n individuals could be taken from the population, and the sample proportion, p̂, calculated for sampled individuals who have brown hair. Unfortunately, unless the full population is sampled, the estimate p̂ most likely won't equal the true value p, since p̂ suffers from sampling noise, i.e., it depends on the particular individuals that were sampled. However, sampling statistics can be used to calculate what are called confidence intervals, which are an indication of how close the estimate p̂ is to the true value p.
Statistics of a Random Sample
The uncertainty in a given random sample (namely that is expected that the proportion estimate, p̂, is a good, but not perfect, approximation for the true proportion p) can be summarized by saying that the estimate p̂ is normally distributed with mean p and variance p(1-p)/n. For an explanation of why the sample estimate is normally distributed, study the Central Limit Theorem.
As defined below, confidence level, confidence intervals, and sample sizes are all calculated with respect to this sampling distribution. In short, the confidence interval gives an interval around p in which an estimate p̂ is "likely" to be. The confidence level gives just how "likely" this is – e.g., a 95% confidence level indicates that it is expected that an estimate p̂ lies in the confidence interval for 95% of the random samples that could be taken. The confidence interval depends on the sample size, n (the variance of the sample distribution is inversely proportional to n, meaning that the estimate gets closer to the true proportion as n increases); thus, an acceptable error rate in the estimate can also be set, called the margin of error, ε, and solved for the sample size required for the chosen confidence interval to be smaller than e; a calculation known as "sample size calculation."
Confidence Level
The confidence level is a measure of certainty regarding how accurately a sample reflects the population being studied within a chosen confidence interval. The most commonly used confidence levels are 90%, 95%, and 99%, which each have their own corresponding z-scores (which can be found using an equation or widely available tables like the one provided below) based on the chosen confidence level.
Note that using z-scores assumes that the sampling distribution is normally distributed, as described above in "Statistics of a Random Sample." Given that an experiment or survey is repeated many times, the confidence level essentially indicates the percentage of the time that the resulting interval found from repeated tests will contain the true result.
Confidence Level z-score (t) Table
| Confidence Level | z-score (t) |
|---|
| 70% | 1.04 |
| 75% | 1.15 |
| 80% | 1.28 |
| 85% | 1.44 |
| 90% | 1.645 |
| 92% | 1.75 |
| 95% | 1.96 |
| 96% | 2.05 |
| 98% | 2.33 |
| 99% | 2.58 |
| 99.90% | 3.29 |
| 99.99% | 4.42 |
Confidence Interval
A confidence interval is an estimated range of likely values for a population parameter, for example, 40 ± 2 or 40 ± 5%. Taking the commonly used 95% confidence level as an example, if the same population were sampled multiple times, and interval estimates made on each occasion, in approximately 95% of the cases, the true population parameter would be contained within the interval. Note that the 95% probability refers to the reliability of the estimation procedure and not to a specific interval. Once an interval is calculated, it either contains or does not contain the population parameter of interest. Some factors that affect the width of a confidence interval include: size of the sample, confidence level, and variability within the sample.
There are different equations that can be used to calculate confidence intervals depending on factors such as whether the standard deviation is known or smaller samples (n < 30) are involved, among others. The calculator provided on this page calculates the confidence interval for a proportion and uses the following equations:
Unlimited population:
CI = p̂ ± z × √(p̂(1-p̂)/n)
Finite population:
CI = p̂ ± z × √(p̂(1-p̂)/n) × √((N-n)/(N-1))
Where:
- z is z-score (see z-score table)
- p̂ is the population proportion
- n and n' are sample sizes
- N is the population size
In statistics, a population is a set of events or elements that have some relevance regarding a given question or experiment. It can refer to an existing group of objects, systems, or even a hypothetical group of objects. Most commonly, however, population is used to refer to a group of people, whether they are the number of employees in a company, number of people within a certain age group of some geographic area, or number of students in a university's library at any given time.
It is important to note that the equation needs to be adjusted when considering a finite population, as shown above. The (N-n)/(N-1) term in the finite population equation is referred to as the finite population correction factor, and is necessary because it cannot be assumed that all individuals in a sample are independent. For example, if the study population involves 10 people in a room with ages ranging from 1 to 100, and one of those chosen has an age of 100, the next person chosen is more likely to have a lower age. The finite population correction factor accounts for factors such as these.
Example (FX):
Calculate the 99% confidence interval for coffee drinkers at Company Q, where 85 out of 120 people surveyed are coffee drinkers.
p̂ = 85/120 = 0.70833
z = 2.58 (for 99% confidence level)
n = 120
CI = 0.70833 ± 2.58 × √(0.70833 × (1-0.70833) / 120)
CI = 0.70833 ± 0.107
CI = 70.833% ± 10.71%
Sample Size Calculation
Sample size is a statistical concept that involves determining the number of observations or replicates (the repetition of an experimental condition used to estimate the variability of a phenomenon) that should be included in a statistical sample. It is an important aspect of any empirical study requiring that inferences be made about a population based on a sample. Essentially, sample sizes are used to represent parts of a population chosen for any given survey or experiment. To carry out this calculation, set the margin of error, ε, or the maximum distance desired for the sample estimate to deviate from the true value. To do this, use the confidence interval equation above, but set the term to the right of the ± sign equal to the margin of error, and solve for the resulting equation for sample size, n. The equation for calculating sample size is shown below.
Unlimited population:
n = z² × p̂(1-p̂) / ε²
Finite population:
n = (z² × p̂(1-p̂) / ε²) / (1 + (z² × p̂(1-p̂) / (ε² × N)))
Where:
- z is z-score (see z-score table)
- ε is the margin of error
- N is the population size
- p̂ is the population proportion
Example (EX):
Determine the sample size necessary to estimate the proportion of people shopping at a supermarket in the U.S. that identify as vegan with 95% confidence, and a margin of error of 5%. Assume a population proportion of 0.5, and unlimited population size.
z = 1.96 (for 95% confidence level)
p̂ = 0.5
ε = 0.05
n = 1.96² × 0.5 × (1-0.5) / 0.05²
n = 3.8416 × 0.25 / 0.0025
n = 0.9604 / 0.0025
n = 384.16
Therefore, at least 385 people would be necessary.
Note: If the population proportion is unknown, using p̂ = 0.5 is the most conservative estimate, as it maximizes the sample size required.