I was speaking with a patient recently who was explaining why he thought that colon cancer screening with fecal occult blood testing (FOBT) was equivalent to colonoscopy. He wished to avoid colonoscopy for all the usual reasons, but also believed logically that it was not a superior test.
I won’t get into the actual evidence behind the two methods of screening; that is not an issue for this post. Instead, I want to discuss his argument. He assumed that colonoscopy had a sensitivity (probability of the test being positive in someone with a significant colon lesion) of 98%, and this figure is in the ballpark of believable numbers. Let’s assume that the sensitivity of FOBT is around 35% (also a reasonable ballpark figure). At that rate, he said, since colonoscopy is performed once every 10 years, and FOBT is performed yearly, after 10 years the performance of the two tests is similar with FOBT having a cumulative sensitivity of 99%.
There are two important problems with this argument. The first, which I will just mention in passing, is also the more obvious. If a test has to be repeated for 10 years to detect a cancer, that cancer may grow and become incurable during the testing period. Had it been found at year 1, it might have had a better chance of being cured.
The second problem is the focus of this post, and to examine it, we need to understand the calculation the patient was performing to get to a cumulative sensitivity of 99%. This is the same type of calculation that is often presented when thinking about the probability of an abnormal lab test due to chance alone in a battery of lab tests. The argument goes as follows:
We define “normal” in a lab test that has a continuous result (like serum sodium) as the range of values that captures 95% of healthy patients.
That means that 5% of healthy patients (or 1 in 20) will have an “abnormal” result on the test.
If we run a battery of different tests on a patient, each of which has a similarly defined normal range, the probability of a single abnormal result due to “chance” goes up.
The actual calculation of the likelihood of an abnormal test result typically confuses medical students and early residents until they’ve heard it presented repeatedly. A common assumption is that if there is a 1 in 20 chance of an abnormal result on each test, then if 20 tests are run there will definitely be an abnormal result. This is not the correct calculation. Under the usual assumptions that people make in thinking about this, the calculation would be that the probability of all the tests being normal is the probability of a single test being normal raised to the power of the number of tests.
Thus, for 20 tests it would be 0.95^20, which is 0.36. The probability that at least one such test will be abnormal due to “chance” is 1-0.36 or 0.64. Or, about 2/3 of normal patients would be expected to have at least one abnormal test on a battery of 20 tests under these assumptions.
So the patient utilizing FOBT for colon cancer screening was saying that, with a sensitivity of about 35%, he could expect a false negative rate of 0.65 per test but that 0.65^10 (for the ten years of testing) yielded a miss rate of 1%. This cumulative sensitivity would then be similar to that of a single colonoscopy, so why should he get the invasive procedure?
And so we come to the second problem with the patient’s argument: each round of testing is not an independent event.
The calculations I described above for cumulative probabilities make the assumption that the individual events are independent from each other. That is, the result of one test has no influence on the others.
If you flip a fair penny three times and get three heads, there is still a 1 in 2 chance that it will come up heads on the fourth flip. But if you perform FOBT once for colon cancer and it is negative it might be in the setting of your particular precancerous lesion that doesn’t tend to bleed. If so, it’s less likely to be bleeding on subsequent FOBT than an “average” lesion and so the cumulative sensitivity cannot assume independence of events. This is not just a theoretical issue: based on some research, repeated testing is thought to actually have a cumulative sensitivity of around 85%, not 99%.
So the patient was miscalculating in his decision about how to be screened for colon cancer. I actually briefly discussed this with him during the appointment, but since the real issue was that he did not want a colonoscopy he was singularly uninfluenced by the math.
This plays out in other areas as well, though:
What about that standard example above regarding batteries of lab tests that all medical students and residents are taught? The tests in the battery, too, are clearly not independent events. Normal and abnormal tests tend to cluster and so it is likely that the probability of the 20th test being abnormal in a healthy patient is affected by whether the prior 19 tests included any abnormal results.
To contrast with the assumption of probabilities in the face of independent events, here we are talking about “conditional” probabilities where we want to know the likelihood of an event given some other set of events. However, in the real world we have very few data about these situations. If, for instance, I wanted to know how likely, due to random variation, a patient with an abnormal serum sodium and chloride is to have a high serum potassium, it is extremely unlikely that I could get a high quality answer without doing my own primary research.
This problem of not knowing conditional probabilities when faced with non-independent events has an important effect on how diagnostic strategies might be misinterpreted if clinicians really started utilizing a test parameter that is a favorite in the EBM community but has not really permeated the clinical world. I’ll address this in a future post.