KNOWLEDGEBASE - ARTICLE #1916

Interpreting a "not statistically significant" result. Five possible explanations.

Let’s consider a simple scenario. You compare cells incubated with a new drug with control cells and measure the activity of an enzyme. Your scientific hypothesis is that the drug will increase activity of that enzyme. You run the experiment and analyze the results with an unpaired t test. You find that the P value is large enough for you to conclude that the result is not statistically significant. Below are five explanations to explain why this happened.

  • Explanation 1: The drug didn't work. The drug did not induce or activate the enzyme you are studying, so the enzyme’s activity is the same (on average) in treated and control cells. This is, of course, the conclusion everyone jumps to when they see the phrase “not statistically significant”.
  • Explanation 2: Trivial effect. The drug may actually affect the enzyme but by only a small amount. If you used a small sample size and/or had a fair amount of experimental error, you would obtain a large P value and conclude that the difference is not statistically significant.
  • Explanation 3: Type II error. The drug really did substantially affect enzyme expression, but random sampling just happened to give you some low values in the cells treated with the drug and some high levels in the control cells. Accordingly, the P value was large, and you conclude tha the result is not statistically significant. This is called making a Type II error.  How likely are you to make a Type II error? It depends on how large the actual difference is, on the sample size, and the experimental variation. The risk of making a Type II error is 100% minus the power of the experiment (expressed as a percentage). 
  • Explanation 4: Artifact due to poor experimental design. The drug really would increase the activity of the enzyme you are measruing. But in this scenario, the drug was inactivated because it was dissolved in an acid. Since the cells never were exposed to active drug, of course the enzyme activity didn't change. The statistical conclusion was correct – adding the drug did not increase the enzyme activity – but the scientific conclusion was completely wrong. Statistical analyses are only a small part of good science. That is why it is so important to design experiments well, to randomize and blind when possible, to include necessary positive and negataive controls, and to validate all methods.
  • Explanation 5: Uninterpretable due to dynamic sample size. In this scenario, you hypothesized that the drug would not work, and you really want the experiment to validate your prediction (maybe you have made a bet on the outcome). You first ran the experiment three times, and the result (n=3) was statistically significant. Then you ran it three more times, and the pooled results (n=6) were still statistically significant. Then you ran it four more times, and finally the results (with n=10) were not statistically significant. This n=10 result (not statistically significant) is the one you present. The P value you obtain from this approach simply cannot be interpreted. P values can only be interpreted at face value when the sample size, the experimental protocol, and all data manipulations and analyses were planned in advance.  

Be cautious when interpreting large P values. Don't make the mistake of instantly believing explanation 1 above, without also thinking about the possibility that the true explanation is one of the other possibilities listed above.

See the companion page on alternative explanations when the results are statisitcally significant. 

Explore the Knowledgebase

Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required.