Research Sample Size: When Does Bigger Stop Being Better?

When it comes to quantitative market research, our natural products clients frequently ask us about the size of the survey sample. Can we get by with surveying 150 people, or do we need 300?

Since there is a direct correlation between cost and size (the more respondents, the higher the panel costs), clients don’t want to have to pay more than they need to get a high degree of confidence in the results. And who could blame them?

So what difference will it make if you get 1,000 respondents or 2,000? The short answer is that it depends on many variables. So what are those variables?

Confidence Interval and Confidence Level Defined

Part of what determines the necessary sample size is the level of confidence you want to have about the data.

The population is all the people out there whom you are trying to draw a conclusion about, such as consumers who buy your type of product in the natural channel. We can’t talk to all of them, and there might be many thousands of them, so we need to talk to a few of them (a sample) to draw a conclusion about what the larger population thinks.

The confidence interval (also called margin of error) is the range that the real value of the larger population is said to fall in. You might know it as the + or – % that comes after the results (e.g., 80% of consumers said they liked a given brand description, with a margin of error or +/- 5%).

The confidence level is the level of confidence you have that the value for the population at large falls within the confidence interval. If we choose a 95% confidence level, then on average 95% of the time, the real population value will fall within the confidence interval.

In the example above, we can say that if we talked to everyone in the larger population, the answer we would get from tallying all of their responses would fall between 75% and 85%, 95% of the time. So you would have a pretty good idea from the research that somewhere between 75% and 85% of your consumers like your new brand description.

Why Not Choose a Higher Confidence Level, Like 99% or 99.999%?

It’s common to choose the 95% confidence level since it offers high confidence without requiring as large a sample as using 99% confidence level would require.

For example, when studying a population of 50,000 people with a confidence interval of +/- 5%, you only need 381 people if you’re using a confidence level of 95%. If you wanted a confidence level of 99%, then you would need almost twice as many people – 687 people. That also means two times the cost for a 4% increase in the confidence level.

For most of our natural products clients, a 95% confidence level is usually sufficient for their needs and provides a good combination of confidence and economy.

Different Confidence Intervals for Different Needs

When it comes to the confidence intervals (the + or – % part), different projects might require different confidence levels.

Suppose you are asking some obvious questions about your brand and you expect that the results will be dramatically different, such as with 70% of people choosing one option and only a handful choosing the other options. In that case, you can probably get by with a large confidence interval and a smaller sample size.

However, if you’re trying to pick up on small differences, such as small differences in the appeal of your proposed packaging versus your current packaging, then you’ll need a smaller confidence interval (and a correspondingly larger sample size).

Let’s look at this packaging example a little closer. We know from our Natural Products Marketing Benchmark Report that new packaging greatly influences sales. If you are testing between your existing packaging and two other new options (A and B), and the results show version A as performing 10% better than the others, you want to make sure that your confidence interval does not exceed + or – 5%.  If the confidence interval is greater, say 10%, then there is overlap. A 10% confidence interval would be too large, since if option A came out as 10% better than option B, you wouldn’t know if that was due to real differences or just random variation, since the confidence intervals would overlap.

How Confidence Level and Confidence Interval Impact Sample Size

In the next few paragraphs, I’m going to get more granular, so bear with me. In the formula for determining how many people you should sample, the necessary sample size is proportional to a) a number derived from the confidence level (the “z-score”) squared, and b) inversely proportional to the confidence interval squared.

The z-score represents how many standard deviations are covered with your given confidence level. If you choose a 95% confidence level, then the z-score is 1.65 since 95% of a normal distribution lies within 1.65 standard deviations of the mean. If you choose a 99% confidence interval, then the z-score is 1.96.

If I lost you with the z-scores, this is what you need to remember: The higher you want your confidence level and the smaller you want your confidence interval to be, the larger your sample size has to be.

So for example, in a preliminary survey where you are expecting differences to be greater than 10% and you can handle a confidence interval as large as +/- 7%, you can probably get away with having only 200 or so respondents.

If you need fine-grained answers and need to be able to pick up on differences that are only 5% apart, then you’ll need 350-400 respondents.

If you are able to get 1,000 respondents, then your confidence intervals will be smaller, around 3%.

It’s important to notice that the confidence intervals shrink in a nonlinear way as the sample size grows, since sample size is inversely proportional to the confidence interval squared. Therefore, sometimes you can make your sample a lot larger, but the confidence interval only shrinks a little bit.

To see how this works with real numbers, when sampling from a large population, the confidence interval for a sample size of 250 people is about +/- 6%. However, for a sample size four times larger at 1,000 people, the confidence interval is only half as large, at about +/- 3%.

Thus, after a certain point there are diminishing returns to using a larger sample size.

To take it to the extreme, if you wanted a 99% confidence level and a confidence interval of +/- 1% and you were asking questions about a population of around 50,000 people, then you’d need a sample size of more than 12,000 people.

However, with a 95% confidence level and a confidence interval of +/- 5%, you can get by with fewer than 400 people.

That’s the beauty of sampling – we can say things about the larger population with a reasonable level of confidence by talking to just a few hundred people in that population.