The majority of scientific papers in psychology report tests of significance--z, t, and F values, for example. Generally, if these values are high, the researchers can conclude the groups differ significantly from one another or the variables are significantly related to one another.

Nevertheless, a high z, t, or F value does not necessarily imply the difference between the groups--or the relationship between these variables--is large. That is, a high z, t, or F value can be generated when the difference between the groups, or the relationship between these variables, is small, provided the sample size is large. That is, a high z, t, or F value merely indicates the researcher can be quite certain the groups differ from each other or the variables are highly related to each other.

In contrast, measures of effect size, such as the Cohen d value, represent the extent to which the groups differ from one another or the degree to which the variables are related (Cohen, 1965). To illustrate, suppose that researchers want to examine whether males or females have acquired more Facebook friends. The d value merely equals the difference between the mean of each gender divided by the standard deviation within each sex--technically, the pooled standard deviation.

If the d value, sometimes called delta, is approximately .20, the effect size is regarded as small. That is, researchers would conclude that any difference between the genders is not especially pronounced, almost imperceptible to the average person. If the d value is approximately .50, the effect size is regarded as medium. Specifically, researchers would conclude that perhaps the difference between the genders is not modest, obvious to experts but probably not to a layperson. Finally, if the d value is approximately .80, the effect size is regarded as large. In particular, researchers would conclude that perhaps the difference between the genders is conspicuous, clear to almost anyone.

Unfortunately, this d value is not especially robust. That is, almost trivial changes in the population can generate major differences in the Cohen d value. This article presents a variant, developed by Algina, Keselman, and Penfield (2005). To compute this d value, researchers should trim the lowest and highest 20% of values in each group, before applying the formula developed by Cohen, and finally multiplying the answer by .642. This measure is suitable whenever researchers need to compare two independent groups or conditions.

Algina, Keselman, and Penfield (2005) described a technique to compute a robust variant of delta. First, researchers should trim the highest 20%, called Windsorizing. That is:

Second, researchers should trim or Windsorize the lowest 20% of values, using the same method. In particular:

- For each group, identify the 80% percentile--that is, the value in which 80% of the values are higher.
- For example, if the sample comprises 50 males and 50 females, the 80% percentile is the fortieth highest value for each gender
- Then, for each group, convert the values that are below the 80% percentile to this value.
- For example, if the 80% percentile for males is 42, values below 42 in males are converted to 42. Similarly, if the 80% percentile for females is 61, then values below 61 in females are converted to 61.

Third, apply the usual formula, developed by Cohen, to these trimmed or Windsorized scores.

As demonstrated by Algina, Keselman, and Penfield (2005), this variant of delta is more robust. Trivial changes in the population or sample can generate major shifts in this index. This problem, according to Algina, Keselman, and Penfield (2005), arises because the index is too dependent on variations in the tails of each distribution. As a consequence, deviations from a normal distribution can affect the tails and bias this measure appreciably. Windsorizing overcomes this problem.

To interpret the magnitude of this effect size, the guidelines that were developed by Cohen should be applied in this instance as well. That is, values of 0.2, 0.5, and 0.8 represent small, medium, and large effects respectively.

Second, Algina, Keselman, and Penfield (2005) recommend that researchers can use Bootstrap procedures to estimate confidence intervals. In their research, these Bootstrap procedures were very effective.

Many alternative indices of effect size are applicable, even when researchers merely want to compare two groups, such as males and females, on some measure. Many of these measures are also applicable to studies in which researchers compare more than two groups:

- Eta squared represents the between group variance divided by the total variance--and is thus the proportion of total variance explained by the groups.
- The square root of this value is equal to the correlation between a dummy variable representing the groups and the dependent measure (Algina, Keselman, & Penfield, 2005).
- The sample common language estimate of effect size, published by McGraw and Wong (1992), represents the probability that a score in one population exceeds a score in another population, assuming the distributions are normal and variances of each group are homogenous
- The dominance statistic, proposed by Cliff (1993, 1996), represents p2>1 - p1>2. In this instance, p2>1 is the probability that a score from the population with the higher mean exceeds a score from the population with the lower mean& p2>1 is the probability that a score from the population with the lower mean exceeds a score from the population with the higher mean.

Algina, J., Keselman, H. J., & Penfield, R. D. (2005). An alternative to Cohen's standardized mean difference effect size: A robust parameter and confidence interval in the two independent groups case. ** Psychological Methods**, 10, 317-328.

Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. ** Psychological Bulletin**, 114, 494-509.

Cliff, N. (1996). Answering ordinal questions with ordinal data using ordinal statistics. ** Multivariate Behavioral Research**, 31, 331-350.

Cohen, J. (1965). Some statistical issues in psychological research. In B. B. Wolman (Ed.), Handbook of clinical psychology (pp. 95-121). New York: Academic Press.

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. ** Psychological Bulletin**, 111, 361-365.

Join our team of writers.

Write a new opinion article,

a new Psyhclopedia article review

or update a current article.

Get recognition for it.

Last Update: 6/26/2016