# Design effect

### Overview

Many statistical techniques have been developed to compare groups, including t-tests and ANOVAs. In general, these techniques assume that all the individuals in each group are independent of one another. However, in many instances, this assumption is violated.

In essence, the design effect is an index that reflects the extent to which this assumption is violated. This measure can also be used to adjust the statistics as well.

### Violation of independence

To illustrate, suppose that researchers wanted to ascertain whether tap water is healthier than bottled water. These researchers could randomly assign some children to drink tap water and other children to drink bottled water. The number of sick days over the next year can then be collated. The data can then be subjected to a t-test.

However, this design is not always possible. For instance, in this example, at school, the children assigned bottle water might share their drink with the children assigned tap water. So, another design might be preferable.

Specifically, rather than randomly assign individuals to each condition, the researchers might assign schools to each condition. That is, the students of some schools might be assigned to drink tap water and students at other schools might be assigned to bottle water.

Nevertheless, when this design is utilized, a problem unfolds. In this instance, the individuals in each group are not really independent of one another. That is, each group comprises clusters of individuals, each corresponding to one school. The individuals within each cluster are probably more similar to each other than individuals in different clusters. For example, if one student is sick, other students in the same school are also often sick but students at other schools are not as likely to be sick. Consequently, one of the assumptions of many statistical tests is violated.

#### Effect of violations of independence

In general, because this assumption of independence is violated, the statistical power diminishes. That is, the probability of a significant effect decreases.

To clarify the extent to which the power diminishes, some researchers calculate the design effect. The design effect equals 1 + ( n - 1) x p. In this formula, n is the average number of individuals in each cluster. That is, in this example, n is the average number of students in each school that participated.

In addition, p, sometimes called rho, is the intra-cluster correlation coefficient. This value ranges from 0 to 1. If the individuals within each cluster are relatively similar, this value approaches 1. If the individuals within each cluster are no more similar to the individuals in other clusters, this value approaches 0. A common value of this intra-cluster correlation coefficient is .05.

To illustrate the impact of this design effect, suppose the average number of individuals in each school cluster, n, is 51. In addition, suppose the intra-cluster correlation coefficient is .05. The design effect would, therefore, equal 1 + (51 - 1) x .05 = 1 + 50 x .05 = 3.5.

So, how can this 3.5 be interpreted? This number provides some insight into the extent to which the absence of independence diminishes power. Specifically, to ensure the power of this study is equivalent to the power of a study in which this assumption is not violated, 3.5 times the sample size is needed.

### Calculation of the design effect

#### Computation of the intra-cluster correlation coefficient

Remember the design effect = 1 + ( n - 1) x p. To calculate this design effect, you first need to estimate the intra-cluster correlation coefficient. Technically, the intra-cluster correlation coefficient = (variance between clusters) over (variance within clusters + variance between clusters).

This formula clarifies the meaning of this intra-cluster correlation coefficient. Nevertheless, this formula is not that easy to use in practice. A more practical formula is intra-cluster correlation coefficient = (MS between - MS error) / (MS between + n-1 x MS error). In particular, to calculate the design effect:

• In SPSS, or your data file, construct a column called cluster.
• In this column, each number corresponds to a separate cluster. For example, in this study, 1 is assigned to everyone in the first school. 2 is assigned to everyone in the second school and so forth
• Conduct a one-way ANOVA. The independent variable is this cluster column, representing the various schools in this example. The dependent variable is the number of sick days.
• The intra-cluster correlation coefficient equals the (MS between - MS error) / (MS between + n-1 x MS error). Remember that n is the average number of people in each cluster.

### Application of the design effect

The design effect can be utilized to adjust the t value. To illustrate, suppose the researchers initially compute a t test and discover that t = 2.0. Because the clusters were disregarded, this t value overestimates the actual t value. That is, this t value is biased. To override this bias:

• Merely divide the t value by the square root of the design effect.
• If the design effect is 3.5, the square root is 1.87.
• The adjusted t value is, therefore, 2/1.87 = 1.06.

### Complications

This article has assumed the design effect equals 1 + ( n - 1) x p. Actually, when the number of people vary across clusters, this equation is not entirely accurate. This equation slightly underestimates the true design effect. More sophisticated analyses are needed to calculate the design effect more precisely.

### References

Coupland, C., & DiGuiseppi, C. (2010). The design and use of cluster randomised controlled trials in evaluating injury prevention interventions: part 2. Design effect, sample size calculations and methods for analysis. Injury Prevention, 16,132-136.

Gorard, S. (2002). Let's keep it simple: The multilevel modelling debate. Research Intelligence, 81, 24-25.

Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group randomized trials in education. Education Evaluation and Policy Analysis, 29, 60-87.

Wejnert, C., Pham, H., Krishna, N., Le, B., & DiNenno, E. (2012). Estimating design effect and calculating sample size for respondent-driven sampling studies of injection drug users in the United States. AIDS and Behavior, 16, 797-806.