If you plan to use inferential statistics (e.g., t-tests, ANOVA, etc.) to analyze your evaluation results, you should first conduct a power analysis to determine what size sample you will need. This page describes what power is as well as what you will need to calculate it.
Table of Contents
Return to: Step 5: Collect Data
To understand power, it is helpful to review what inferential statistics test. When you conduct an inferential statistical test, you are often comparing two hypotheses:
Statistical tests look for evidence that you can reject the null hypothesis and conclude that your program had an effect. With any statistical test, however, there is always the possibility that you will find a difference between groups when one does not actually exist. This is called a Type I error. Likewise, it is possible that when a difference does exist, the test will not be able to identify it. This type of mistake is called a Type II error.
Power refers to the probability that your test will find a statistically significant difference when such a difference actually exists. In other words, power is the probability that you will reject the null hypothesis when you should (and thus avoid a Type II error). It is generally accepted that power should be .8 or greater; that is, you should have an 80% or greater chance of finding a statistically significant difference when there is one.
Generally speaking, as your sample size increases, so does the power of your test. This should intuitively make sense as a larger sample means that you have collected more information -- which makes it easier to correctly reject the null hypothesis when you should.
To ensure that your sample size is big enough, you will need to conduct a power analysis calculation. Unfortunately, these calculations are not easy to do by hand, so unless you are a statistics whiz, you will want the help of a software program. Several software programs are available for free on the Internet and are described below.
For any power calculation, you will need to know:
When these values are entered, a power value between 0 and 1 will be generated. If the power is less than 0.8, you will need to increase your sample size.
Power analysis calculations assume that your evaluation will go as planned. It is quite likely, however, that you will not be able to use some of the data in your sample, which will decrease the power of your test. You may lose data, for example, when participants complete a pretest but drop out of the program before completing the posttest. Likewise, you may decide to exclude some data from your analysis because it is incomplete (e.g., when a participant only answers a few questions on a survey). To ensure that you have enough power in the end, estimate how many individuals you may lose from your sample and add that many to the sample size suggested by the power calculation.
The following resources provide more information on power analysis and how to use it:
There is always some likelihood that the changes you observe in your participants’ knowledge, attitudes, and behaviors are due to chance rather than to the program. Testing for statistical significance helps you learn how likely it is that these changes occurred randomly and do not represent differences due to the program.
To learn whether the difference is statistically significant, you will have to compare the probability number you get from your test (the p-value) to the critical probability value you determined ahead of time (the alpha level). If the p-value is less than the alpha value, you can conclude that the difference you observed is statistically significant.
|P-Value: the probability that the results were due to chance and not based on your program. P-values range from 0 to 1. The lower the p-value, the more likely it is that a difference occurred as a result of your program.|
Alpha (α) level: the error rate that you are willing to accept. Alpha is often set at .05 or .01. The alpha level is also known as the Type I error rate. An alpha of .05 means that you are willing to accept that there is a 5% chance that your results are due to chance rather than to your program.
An alpha level of less than .05 is accepted in most social science fields as statistically significant, and this is the most common alpha level used in EE evaluations.
The following resources provide more information on statistical significance:
When a difference is statistically significant, it does not necessarily mean that it is big, important, or helpful in decision-making. It simply means you can be confident that there is a difference. Let’s say, for example, that you evaluate the effect of an EE activity on student knowledge using pre and posttests. The mean score on the pretest was 83 out of 100 while the mean score on the posttest was 84. Although you find that the difference in scores is statistically significant (because of a large sample size), the difference is very slight, suggesting that the program did not lead to a meaningful increase in student knowledge.
To know if an observed difference is not only statistically significant but also important or meaningful, you will need to calculate its effect size. Rather than reporting the difference in terms of, for example, the number of points earned on a test or the number of pounds of recycling collected, effect size is standardized. In other words, all effect sizes are calculated on a common scale -- which allows you to compare the effectiveness of different programs on the same outcome.
There are different ways to calculate effect size depending on the evaluation design you use. Generally, effect size is calculated by taking the difference between the two groups (e.g., the mean of treatment group minus the mean of the control group) and dividing it by the standard deviation of one of the groups. For example, in an evaluation with a treatment group and control group, effect size is the difference in means between the two groups divided by the standard deviation of the control group.
mean of treatment group – mean of control group
standard deviation of control group
To interpret the resulting number, most social scientists use this general guide developed by Cohen:
Because effect size can only be calculated after you collect data from program participants, you will have to use an estimate for the power analysis. Common practice is to use a value of 0.5 as it indicates a moderate to large difference.
For more information on effect size, see:
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New Jersey: Lawrence Erlbaum.
Smith, M. (2004). Is it the sample size of the sample as a fraction of the population that matters? Journal of Statistics Education. 12:2. Retrieved September 14, 2006 from http://www.amstat.org/publications/jse/v12n2/smith.html
Patton, M. Q. (1990). Qualitative research and evaluation methods. London: Sage Publications.