Tipultech logo

False detection rate

Author: Dr Simon Moss

Overview

The false discovery rate refers to a procedure that researchers and statisticians sometimes use when they need to conduct many statistical tests, all of which correspond to an overlapping hypothesis. In these instances, some researchers apply a Bonferroni adjustment to ensure the probability of Type I errors does not exceed .05. This approach, however, is sometimes regarded as unnecessarily conservative, reducing power appreciably. The false discovery rate is sometimes conceptualized as a compromise between the Bonferroni adjustment and no adjustment at all.

Function of the false discovery rate

Need to control the Type I error rate

To demonstrate the significance of Bonferroni adjustments and the false discovery rate, consider the following example. A researcher wants to examine whether personality is related to intelligence. Participants complete a personality inventory that measures five traits: extraversion, neuroticism, conscientiousness, agreeableness, and openness. In addition, participants complete three measures of intelligence: a test of verbal ability, a test of numerical ability, and a test of abstract ability. The correlation between each personality trait and each ability test appears in the following table.


Extraversion Neuroticism Conscientious Agreeable Openness
Verbal ability r = .17 p = .10 r = .07 p = .29 r = .33 p = .005 r = .05 p = .56 r = .16 p = .15
Numerical ability r = .18 p = .08 r = .47 p = .004 r = .20 p = .09 r = .21 p = .09 r = .06 p = .35
Abstract ability r = .29 p = .04 r = .04 p = .57 r = .28 p = .04 r = .15 p = .15 r = .07 p = .30

In this instance, the p value for 4 of the correlations is less than .05 and thus significant. The researcher will, therefore, conclude that personality is related to ability

The problem, however, is that 15 tests have been conducted to assess the same hypothesis: that personality is related to ability. To highlight this problem, consider the following rationale:

Benefits and drawbacks of the Bonferroni adjustment

To override this problem, some researchers apply a Bonferroni adjustment. Specifically, they change the level of alpha. In particular, the level of alpha they apply to each test is equal to the original level of alpha divided by the number of tests. In this instance, for example:

The Bonferroni adjustment ensures the probability that at least one of the 15 tests could generate a Type I error is approximately .05. Hence, the Bonferroni adjustment ensures the probability that researchers falsely conclude that personality is related to ability would not exceed .05.

Nevertheless, the Bonferroni adjustment appreciably reduces power-that is, the adjustment diminishes the likelihood that individuals will generate significant results when the variables are indeed related in the population.

Several scholars have recommended refinements to the Bonferroni adjustment that marginally redress this problem. However, even these refined variants are not especially powerful.

False detection rate

The false detection rate is often regarded as a compromise between Bonferroni adjustments and the absence of any adjustment. The basic rationale is that researchers should not necessarily attempt to ensure the probability of at least one Type I error is .05. Instead, researchers should attempt to control the false detection rate-which is the expected number of false significant effects divided by the number of significant effects.

To conduct this technique, which was developed by Benjamani and Hochberg (1995), several steps are conducted. First, the researcher orders the statistics from the smallest to the largest p values. In the previous example, the order would be:

Position in sequence Correlation and p value
1

Neuroticism and numerical ability, r = .47, p = .004

2

Conscientiousness and verbal ability, r = .33, p = .005

3

Extraversion and abstract ability, r = .29, p = .04

4

Conscientiousness and abstract ability, r = .28, p = .04

5

Extraversion and numerical ability, r = .18. p .08

6

Agreeableness and numerical ability, r = .21, p = .09

7

Conscientiousness and numerical ability, r = .20, p = .09

8

Extraversion and verbal ability, r = .17, p = .10

9

Openness and verbal ability, r = .16, p = .15

10

Agreeableness and abstract ability, r = .15, p = .15

11

Neuroticism and verbal ability, r = .07, p = .29

12

Openness and abstract ability, r = .07, p = .30

13

Openness and numerical ability, r = .06, p = .35

14

Agreeableness and verbal ability, r = .05, p = .56

15

Neuroticism and abstract ability, r = .04, p = .57

For each of these correlations, the researcher needs to compute the position in the sequence divided by number of tests multiplied by alpha. To illustrate, for the first correlation, the researcher would calculate 1 / 15 x .05, which equals .003. For the second correlation, the researched would calculate 2 / 15 x .05, which equals .006, and so forth. The outcome of these calculations is called (i/m)alpha, as shown below

(i/m)alpha Correlation and p value
0.003

Neuroticism and numerical ability, r = .47, p = .004

0.007

Conscientiousness and verbal ability, r = .33, p = .005

0.010

Extraversion and abstract ability, r = .29, p = .04

0.013

Conscientiousness and abstract ability, r = .28, p = .04

0.017

Extraversion and numerical ability, r = .18. p .08

0.020

Agreeableness and numerical ability, r = .21, p = .09

0.023

Conscientiousness and numerical ability, r = .20, p = .09

0.027

Extraversion and verbal ability, r = .17, p = .10

0.030

Openness and verbal ability, r = .16, p = .15

0.033

Agreeableness and abstract ability, r = .15, p = .15

0.037

Neuroticism and verbal ability, r = .07, p = .29

0.040

Openness and abstract ability, r = .07, p = .30

0.043

Openness and numerical ability, r = .06, p = .35

0.047

Agreeableness and verbal ability, r = .05, p = .56

0.050

Neuroticism and abstract ability, r = .04, p = .57

Finally, the researcher scans the table, beginning at the bottom, comparing each (i/m) alpha value with the corresponding p value. They discontinue this scanning process once the (i/m) alpha value is greater than is the p value.

In this instance, they would continue scanning until they reach the second row from the top, in which 0.007 is less than p = .005. Hence, they would conclude this row and all preceding rows are significant. Specifically, in this instance, two of the correlations are significant: neuroticism and numerical ability as well as conscientiousness and verbal ability.

This procedure is intended to control the false detection rate. In particular, when this procedure is applied, the false detection rate tends to approximate about .05 to .15.

Adjusted false detection rate

Benjamani and Hochberg (2000) presented an adjusted variant of the false detection rate. To complete this procedure, the researcher first completes the technique recommended by Benjamani and Hochberg (1995). If none of the effects are significant-that is, if none of the (i/m) alpha values and less than a correspond p value, no further testing is required. However, if one or more of the effects are significant, the procedure continues.

To continue, for each (i/m) alpha values, researchers must calculate the slope, which equals (1 - p value)/(number of tests - position in sequence + 1). For example, for the first p value, the slope equals (1 - .001) / 15 - 1 + 1), which equals. The other slopes appear in the table below

Slope (i/m) alpha Correlation and p value
0.066 0.003

Neuroticism and numerical ability, r = .47, p = .004

0.071 0.007

Conscientiousness and verbal ability, r = .33, p = .005

0.074 0.010

Extraversion and abstract ability, r = .29, p = .04

0.080 0.013

Conscientiousness and abstract ability, r = .28, p = .04

0.084 0.017

Extraversion and numerical ability, r = .18. p .08

0.091 0.020

Agreeableness and numerical ability, r = .21, p = .09

0.101 0.023

Conscientiousness and numerical ability, r = .20, p = .09

0.113 0.027

Extraversion and verbal ability, r = .17, p = .10

0.121 0.030

Openness and verbal ability, r = .16, p = .15

0.142 0.033

Agreeableness and abstract ability, r = .15, p = .15

0.141 0.037

Neuroticism and verbal ability, r = .07, p = .29

0.175 0.040

Openness and abstract ability, r = .07, p = .30

0.217 0.043

Openness and numerical ability, r = .06, p = .35

0.220 0.047

Agreeableness and verbal ability, r = .05, p = .56

0.430 0.050

Neuroticism and abstract ability, r = .04, p = .57

Next, beginning at the top, researchers scan down the table until one of the slopes is less than a previous slope. In this instance, .141 is less than .142. The final slope that is scanned, in this instance .141, is called Sj.

This information, according to Benjamani and Hochberg (2000), can be used to estimate the number of true non-significant tests-the number of null correlations in the population. In particular, the number of true non-significant tests equals either the total number of tests or (1/Sj + 1), whichever is smaller. In this instance, the total number of tests is 15. In addition, (1/Sj + 1) = (1/.141 + 1) = 8.09, which should be rounded up to an integer, which in this instance is 9. Hence, the number of true non-significant tests probably equals about 9.

Finally, for each correlation, calculate alpha multiplied by the position in sequence divided by the number of true non-significant tests, called (i/mo) alpha. For example, for the first p value, the researcher would compute 0.05 x 1 / 9. These values are presented in the following table.

(i/mo) alpha (i/m) alpha Correlation and p value
0.006 0.003

Neuroticism and numerical ability, r = .47, p = .004

0.011 0.007

Conscientiousness and verbal ability, r = .33, p = .005

0.017 0.010

Extraversion and abstract ability, r = .29, p = .04

0.022 0.013

Conscientiousness and abstract ability, r = .28, p = .04

0.028 0.017

Extraversion and numerical ability, r = .18. p .08

0.033 0.020

Agreeableness and numerical ability, r = .21, p = .09

0.039 0.023

Conscientiousness and numerical ability, r = .20, p = .09

0.044 0.027

Extraversion and verbal ability, r = .17, p = .10

0.050 0.030

Openness and verbal ability, r = .16, p = .15

0.056 0.033

Agreeableness and abstract ability, r = .15, p = .15

0.061 0.037

Neuroticism and verbal ability, r = .07, p = .29

0.067 0.040

Openness and abstract ability, r = .07, p = .30

0.072 0.043

Openness and numerical ability, r = .06, p = .35

0.078 0.047

Agreeableness and verbal ability, r = .05, p = .56

0.083 0.050

Neuroticism and abstract ability, r = .04, p = .57

Finally, the researcher, beginning at the bottom, scans the table until a p value is less than (i/mo) alpha. Again, this condition is not fulfilled until the second row from the top, in which .005 is less than 0.011. Hence, the researcher would conclude that two of the correlations are significant: neuroticism and numerical ability as well as conscientiousness and verbal ability.

Holland and Cheung highlight that, sometimes, researchers who apply this procedure might conclude that an effect is significant even when the p value exceeds .05 or alpha. In these instances, the effect should be regarded as not significant.

Keselman, Cribbie, and Holland (2002) recommended either of these two false detection procedures, especially when the number of tests is large. These tests are more powerful than variants of the Bonferroni test, for example-particularly when the number of tests is large (Keselman, Cribbie, & Holland, 1999;; Williams, Jones, & Tukey, 1999). This approach is especially suitable when the research is exploratory or when each test does correspond to marginally different implications (Keselman, Cribbie, & Holland, 2002).

References

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57, 289-300.

Benjamini, Y., & Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational & Behavioral Statistics, 25, 60-83.

Benjamini, Y., & Liu, W. (1999). A step-down multiple hypothesis testing procedure that controls the false discovery rate under independence. Journal of Statistical Planning and Inference, 82, 163-170.

Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1152-1175.

Drigalenko, E. I., & Elston, R. C. (1997). False discoveries in genome scanning. Genetic Epidemiology, 14, 779-784.

Einot, I., & Gabriel, K. R. (1975). A study of the powers of several methods of multiple comparisons. Journal of the American Statistical Association, 70, 574-583.

Halperin, M., Lan, K. K., & Hamdy, M. I. (1988). Some implications of an alternative definition of the multiple comparison problem. Biometrika, 75, 773-778.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802.

Hochberg, Y., & Benjamini, Y. (1990). More powerful procedures for multiple significance testing. Statistics in Medicine, 9, 811-818.

Holland, B., & Cheung, S. H. (2002). Family size robustness criteria for multiple comparison procedures. Journal of the Royal Statistical Society, B, 54, 63-77.

Hommel, G. (1988). A comparison of two modified Bonferroni procedures. Biometrika, 75, 383-386.

Kaiser, H. F., & Dickman, K. (1962). Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix. Psychometrika, 27, 179-182.

Keselman, H. J., Cribbie, R., & Holland, B. (1999). The pairwise multiple comparison multiplicity problem: An alternative approach to familywise/comparisonwise Type I error control. Psychological Methods, 4, 58-69.

Keselman, H. J., Cribbie, R., & Holland, B. (2002). Controlling the rate of Type I error over a large set of statistical tests. British Journal of Mathematical and Statistical Psychology, 55, 27-39.

Olejnik, S., Li, J., Supattathum, S., & Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational & Behavioral Statistics, 22, 389-406.

Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika, 77, 663-665.

Rothman, K. J. (1990). No adjustments are needed for multiple comparisons. Epidemiology (Cambridge, Mass.), 1, 43-46.

Sarkar, S. K., & Chang, C. (1997). The Simes method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association, 92, 1601-1608.

Saville, D. J. (1990). Multiple comparison procedures: The practical solution. American Statistician, 44, 174-180.

Schippman, J. S., & Prien, E. P. (1986). Psychometric evaluation of an integrated assessment procedure. Psychological Reports, 59, 111-122.

Shaffer, J. P. (1995). Multiple hypothesis testing: A review. Annual Review of Psychology, 46, 561-584. Williams, V. S., Jones, L. V., & Tukey, J. W. (1999). Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational & Behavioral Statistics, 24, 42-69.

Wilson, W. (1962). A note on the inconsistency inherent in the necessity to perform multiple comparisons. Psychological Bulletin, 59, 296-300.

Yekutieli, D., & Benjamini, Y. (1999) Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference, 82, 171-196.



Academic Scholar?
Join our team of writers.
Write a new opinion article,
a new Psyhclopedia article review
or update a current article.
Get recognition for it.





Last Update: 6/21/2016