The whole point of an ROC curve is to help you decide where to draw the line between 'normal' and 'not normal'. This will be an easy decision if all the control values are higher (or lower) than all the patient values. Usually, however, the two distributions overlap, making it not so easy.
To help you make this decision, Prism tabulates and plots the sensitivity and specificity of the test at various cut-off values.
Sensitivity: The fraction of people with the disease that the test correctly identifies as positive.
Specificity: The fraction of people without the disease that the test correctly identifies as negative.
Prism calculates the sensitivity and specificity using each value in the data table as the cutoff value. This means that it calculates many pairs of sensitivity and specificity. I
Prism displays these results in two forms. The table labeled "ROC" curve is used to create the graph of 100%-Specificity% vs. Sensitivity%. The table labeled "Sensitivity and Specifity" tabulates those values along with their 95% confidence interval for each possible cutoff between normal and abnormal.
The area under a ROC curve is called the C statistic, the concordance statistic or the C-index. It quantifies the overall ability of the test to discriminate between those individuals with the disease and those without the disease. A truly useless test (one no better at identifying true positives than flipping a coin) has an area of 0.5. A perfect test (one that has zero false positives and zero false negatives) has an area of 1.00. Your test will have an area between those two values. Even if you choose to plot the results as percentages, Prism reports the area as a fraction.
Prism computes the area under the entire AUC curve, starting at 0,0 and ending at 100, 100. Note that whether or not you ask Prism to plot the ROC curve out to these extremes, it computes the area for that entire curve.
While it is clear that the area under the curve is related to the overall ability of a test to correctly identify normal versus abnormal, it is not so obvious how one interprets the area itself. There is, however, a very intuitive interpretation.
If patients have higher test values than controls, then:
The area represents the probability that a randomly selected patient will have a higher test result than a randomly selected control.
If patients tend to have lower test results than controls:
The area represents the probability that a randomly selected patient will have a lower test result than a randomly selected control.
For example: If the area equals 0.80, on average, a patient will have a more abnormal test result than 80% of the controls. If the test were perfect, every patient would have a more abnormal test result than every control and the area would equal 1.00.
If the test were worthless, no better at identifying normal versus abnormal than chance, then one would expect that half of the controls would have a higher test value than a patient known to have the disease and half would have a lower test value. Therefore, the area under the curve would be 0.5.
The area under a ROC curve can never be less than 0.50. If the area is first calculated as less than 0.50, Prism will reverse the definition of abnormal from a higher test value to a lower test value. This adjustment will result in an area under the curve that is greater than 0.50.
Berrar points out that ROC curves must be interpreted with care, and there is more to interpretation than looking at the AUC (1).
Prism also reports the standard error of the area under the ROC curve, as well as the 95% confidence interval. These results are computed by a nonparametric method that does not make any assumptions about the distributions of test results in the patient and control groups.
Interpreting the confidence interval is straightforward. If the patient and control groups represent a random sampling of a larger population, you can be 95% sure that the confidence interval contains the true area.
Prism completes your ROC curve evaluation by reporting a P value that tests the null hypothesis that the area under the curve really equals 0.50. In other words, the P value answers this question:
If the test diagnosed disease no better flipping a coin, what is the chance that the area under the ROC curve would be as high (or higher) than what you observed?
If your P value is small, as it usually will be, you may conclude that your test actually does discriminate between abnormal patients and normal controls.
If the P value is large, it means your diagnostic test is no better than flipping a coin to diagnose patients. Presumably, you wouldn't collect enough data to create an ROC curve until you are sure your test actually can diagnose the disease, so high P values should occur very rarely.
1. | Berrar D, Flach P. Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). Brief Bioinform. Oxford University Press; 2011 Mar 21;13(1):bbr008–97. |