Contents

Statistical principles

Analyzing one group

Analyzing two groups

Analysis of variance (ANOVA)

Analyzing survival data

Categorical data
(contingency tables):

The confidence interval of a proportion

Creating contingency tables

Interpreting contingency tables

Categorical data
(contingency tables):

Our Products...
Prism
InStat
StatMate
Intuitive Biostatistics


© 1999 GraphPad Software Inc.

The Prism Guide to Interpreting Statistical Results
This guide is excerpted from Analyzing Data with GraphPad Prism, a book that accompanies the program GraphPad Prism. Browse this guide using the Contents navigation on the left. You may also download the entire book.

Creating contingency tables

Introduction to contingency tables

Contingency tables summarize results where the outcome is a categorical variable such as disease vs. no disease, pass vs. fail, artery open vs. artery obstructed.

Use contingency tables to display the results of four kinds of experiments.

In a cross-sectional study, you recruit a single group of subjects and then classify them by two criteria (row and column). As an example, let's consider how to conduct a cross-sectional study of the link between electromagnetic fields (EMF) and leukemia. To perform a cross-sectional study of the EMF-leukemia link, you would need to study a large sample of people selected from the general population. You would assess whether or not each subject has been exposed to high levels of EMF. This defines the two rows in the study. You then check the subjects to see whether or not they have leukemia. This defines the two columns. It would not be a cross-sectional study if you selected subjects based on EMF exposure or on the presence of leukemia.

A prospective study starts with the potential risk factor and looks forward to see what happens to each group of subjects. To perform a prospective study of the EMF-leukemia link, you would select one group of subjects with low exposure to EMF and another group with high exposure. These two groups define the two rows in the table. Then you would follow all subjects over time and tabulate the numbers that get leukemia. Subjects that get leukemia are tabulated in one column; the rest are tabulated in the other column.

A retrospective case-control study starts with the condition being studied and looks backwards at potential causes. To perform a retrospective study of the EMF-leukemia link, you would recruit one group of subjects with leukemia and a control group that does not have leukemia but is otherwise similar. These groups define the two columns. Then you would assess EMF exposure in all subjects. Enter the number with low exposure in one row, and the number with high exposure in the other row. This design is also called a case control study

In an experiment, you manipulate variables. Start with a single group of subjects. Half get one treatment, half the other (or none). This defines the two rows in the study. The outcomes are tabulated in the columns. For example, you could perform a study of the EMF/leukemia link with animals. Half are exposed to EMF, while half are not. These are the two rows. After a suitable period of time, assess whether each animal has leukemia. Enter the number with leukemia in one column, and the number without leukemia in the other column. Contingency tables can also tabulate the results of some basic science experiments. The rows represent alternative treatments, and the columns tabulate alternative outcomes.

If the table has two rows and two columns, Prism computes P values using either Fisher's exact test or the chi-square test, and summarizes the data by computing the relative risk, odds ratio or difference in proportions, along with 95% confidence intervals. If the table has two rows and three or more columns (or three or more rows and two columns) Prism calculates both the chi-square test and the chi-square test for trend. With larger tables, Prism only calculates the chi-square test.

How analyses of 2x2 contingency tables work

If your table has two rows and two columns, Prism computes relative risk, odds ratio and P1-P2 using the equations below:


Outcome 1 Outcome 2
Group 1 A B
Group 2 C D

MathType Equation

If any of the four values in the contingency table are zero, Prism adds 0.5 to all values before calculating the relative risk, odds ratio and P1-P2 (to avoid dividing by zero).

The word "risk" is appropriate when the first row is the exposed or treated group and the left column is the bad outcome. With other kinds of data, the term "risk" isn't appropriate, but you may still be interested in the ratio of proportions. Prism calculates the 95% confidence interval for the relative risk using the approximation of Katz. You can be 95% certain that this range includes the true relative risk.

If your data are from a case-control retrospective study, neither the relative risk nor P1-P2 is meaningful.  Instead, Prism calculates an odds ratio and the confidence interval of the odds ratio using the approximation of Woolf. If the disease is rare, you can think of an odds ratio as an approximation of the relative risk.

Prism computes the P value using either the chi-square test or Fisher's exact test.

How to think about the relative risk, odds ratio and P1-P2

To understand the differences between the relative risk, odds ratio and P1-P2 consider this example. There are two groups of subjects, denoted by two rows. There are two outcomes denoted by columns:


Progress

No progression

AZT 76 399
Placebo 129 332
Method Description
Difference between proportions In the example, disease progressed in 28% of the placebo-treated patients and in 16% of the AZT-treated subjects. The difference is 28% - 16% = 12%.
Relative risk

The ratio is 16%/28%=0.57. A subject treated with AZT has 57% the chance of disease progression as a subject treated with placebo. The word "risk" is not always appropriate. Think of the relative risk as being simply the ratio of proportions.

Odds ratio This is a more difficult concept. There isn't much point in calculating an odds ratio for experimental or prospective studies. When analyzing case-control retrospective studies, however, you cannot meaningfully calculate the difference between proportions or the relative risk. The odds ratio is used to summarize the results of these kinds of studies. See a biostatistics or epidemiology book for details

How analyses of larger contingency tables work

If your table has two columns and more than two rows (or two rows and more than two columns), Prism will perform both the chi-square test for independence and the chi-square test for trend.

The chi-square test for independence asks whether there is an association between the variable that defines the rows and the variable that defines the columns.

Prism first computes the expected values for each value. These expected values are calculated from the row and column totals, and are not displayed in the results. The discrepancies between the observed values and expected values are then pooled to compute chi-square, which is reported. A large value of chi-square tells you that there is a large discrepancy. The P value answers this question: If there is really no association between the variable that defines the rows and the variable that defines the columns, then what is the chance that random sampling would result in a chi-square value as large (or larger) as you obtained in this experiment.

The P value from the chi-square test for trend answers this question: If there is no linear trend between row (column) number and the fraction of subjects in the left column (top row), what is the chance that you would happen to observe such a strong trend as a coincidence of random sampling? If the P value is small, you will conclude that there is a statistically significant trend.

For more information about the chi-square test for trend, see the excellent text, Practical Statistics for Medical Research by D. G. Altman, published in 1991 by Chapman and Hall.