The false discovery rate and statistical signficance 

The false discovery rate and statistical signficance 


Interpreting low P values is not straightforward
Imagine that you are screening drugs to see if they lower blood pressure. You use the usual threshold of P<0.05 as defining statistical significance. Based on the amount of scatter you expect to see and the minimum change you would care about, you've chosen the sample size for each experiment to have 80% power to detect the difference you are looking for with a P value less than 0.05.
If you do get a P value less than 0.05, what is the chance that the drug truly works?
The answer is: It depends on the context of your experiment. Let's start with the scenario where based on the context of the work, you estimate there is a 10% chance that the drug actually has an effect. What happens when you perform 1000 experiments? Given your 10% estimate, the two column totals below are 100 and 900. Since the power is 80%, you expect 80% of truly effective drugs to yield a P value less than 0.05 in your experiment, so the upper left cell is 80. Since you set the definition of statistical significance to 0.05, you expect 5% of ineffective drugs to yield a P value less than 0.05, so the upper right cell is 45.

Drug really works 
Drug really doesn't work 
Total 

P<0.05, “significant” 
80 
45 
125 
P>0.05, “not significant” 
20 
455 
475 
Total 
100 
900 
1000 
In all, you expect to see 125 experiments that yield a "statistically significant" result, and only in 80 of these does the drug really work. The other 45 experiments yield a "statistically significant" result but are false positives or false discoveries. The false discovery rate (abbreviated FDR) is 45/125 or 36%. Not 5%, but 36%.
The table below, from chapter 12 of Essential Biostatistics, shows the FDR for this and three other scenarios.
Prior Probability 
FDR for P<0.05 
FDR for 0.045 < P < 0.050 


Comparing randomly assigned groups in a clinical trial prior to treatment 
0% 
100% 
100% 
Testing a drug that might possibly work 
10% 
36% 
78% 
Testing a drug with 50:50 chance of working 
50% 
6% 
27% 
Positive controls 
100% 
0% 
0% 
Each row in the table above is for a different scenario defined by a different prior (before collecting data) probability of there being a real effect. The middle column shows the expected FDR as calculated above. This column answers the question: "If the P value is less than 0.05, what is the chance that there really is no effect and the result is just a matter of random sampling?". Note this answer is not 5%. The FDR is quite different than alpha, the threshold P value used to define statistical significance.
The right column, determined by simulations, asks a slightly different question based on work by Colquhoun(1).: "If the P value is just a little bit less than 0.05 (between 0.045 and 0.050), what is the chance that there really is no effect and the result is just a matter of random sampling?" These numbers are much higher. Focus on the third row where the prior probability is 50%. In this case, if the P value is just barely under 0.05 there is a 27% chance that the effect is due to chance. Note: 27%, not 5%! And in a more exploratory situation where you think the prior probability is 10%, the false discovery rate for P values just barely lower than 0.05 is 78%. In this situation, a statistically significant result (defined conventionally) means almost nothing.
Bottom line: You can't interpret statistical significance (or a P value) in a vacuum. Your interpretation depends on the context of the experiment. The false discovery rate can be much higher than the value of alpha (usually 5%). Interpreting results requires common sense, intuition, and judgment.
Reference
1.Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of pvalues. Royal Society Open Science, 1(3), 140216–140216. http://doi.org/10.1098/rsos.140216