Sidebar: Understanding screening tests: sensitivity versus specificity, false positives and false negatives, and probabilities of having a disease
There are important differences in the meanings and consequences of the words used to describe the accuracy of a diagnostic or screening test. A test that can correctly identify a disease among a group of people with a disease (sensitivity), is different from a test that can correctly identify those without a disease among a group of healthy people (specificity). No one wants a screening test (which is done on a group of asymptomatic people) that identifies a bunch of them as being disease-free when they’re not (false negatives), or identifies a bunch of them as having the disease when they don’t (false positives). The consequences can be costly and deadly.
False positives turn people’s lives upside down and create extreme anxiety that can stay with them and their families for months to years. False positives also lead to needless, painful and costly medical tests and work-ups, biopsies and surgeries. False negatives, in contrast, miss the disease, give false reassurance that leads to neglecting critical symptoms and not getting treatment until the disease may be much more advanced.
The problems in these screening studies may have been missed by some because of today’s fundamental biases towards screening, among both the public and healthcare professionals, and the resulting failure to examine screening tests as carefully as we might other medical interventions. Screening tests can appear benign and it’s easy to get caught up in promotions, believing early detection = cure. Intuitively, screening might not seem to have any downsides. That’s a common misperception.
Justifying screening tests actually requires a more rigorous standard of evidence than is applied to other types of medical interventions, said Dr. Thomas J. Gates, M.D., associate director of the family practice residency at Lancaster General Hospital, Lancaster, PA. And there are several reasons for this, as he explained in detail in a 2001 issue of American Family Practice. In fact, there are even reasons it can be more helpful and ethical not to screen.
All screening tests come with risks, not just in the costs and discomfort of the actual test, but to start, from the inevitable false positives. Even the best screening tests available today have low predictive value, said Dr. Gates, meaning most positive results will turn out to be false positives. Most diseases that are screened for affect relatively few people among a population. This does not mean those lives aren’t valued. They are the ones who might potentially benefit from screening. It means that the risks of screening tests aren’t just born by those who could benefit from the tests, but everyone, most of whom won’t benefit.
Then, should a disease be correctly diagnosed early through screening, if there is no treatment or the resulting treatment isn’t effective, is harmful, or doesn’t make a difference in when they die, then the patient has been more harmed than helped. As Dr. Gates emphasized, it is important to remember that the ultimate purpose of screening is to reduce deaths and suffering from disease:
If improved outcomes cannot be demonstrated, the rationale for screening is lost. Early diagnosis by itself does not justify a screening program. The only justification for a screening program is early diagnosis that leads to a measurable improvement in outcome.
As we’ve discussed before, like every other medical treatment, the only way to know if a screening test really ‘saves lives’ or helps people live longer, is to do prospective randomized clinical trials and compare actual mortality rates among the screened and unscreened groups. If people wait until clinical symptoms of a cancer appear, for instance, do they die at the same time as those who were screened and learned months to years earlier that they had cancer? Does screening actually affect the natural course of the cancer and give these cancer patients more years to enjoy, or just more years as a cancer patient? Mortality rates, themselves, are not determined by the timing of diagnosis, the number of years survived after a diagnosis can merely be skewed by a screening test.
So, when oncologist Dr. Michael Baum, Emeritus Professor of Surgery and visiting professor of Medical Humanities at University College, London, said last week that mammogram screenings were not even effective at saving lives, he was referring in part to this lead time bias. Failing to understand lead-time bias can make us believe that screening leads to longer “cancer survival” when it may make no difference at all.
“The only reliable way to prove the effectiveness of a proposed screening program is to demonstrate lower rates of all-cause or disease-specific mortality in a randomly assigned screened population compared with unscreened control subjects, using intention-to-treat analysis, a so-called randomized controlled trial,” said Dr. Gates.
These randomized clinical trials also determine if the benefits of screening outweigh the risks for the group of people who receive screening tests. In the development of medical interventions prior to FDA approval for marketing, this critical evidence is derived during Phase III trials.
Once a screening test has been introduced into the marketplace, though, that doesn’t end the complex ethical problems and decisions that patients and doctors confront. When a screening test result comes in, what does it mean? Knowing the specificity and sensitivity of a screening test doesn’t tell a doctor everything he/she needs to interpret the test results and determine the likelihood that the patient with a positive or negative test result actually has the disease. (That’s called the Positive Predictive Value (PPV) of a test.)
To figure that out, they use the statistical formula called Bayes’ rule (from the 1800s!) which includes the prevalence of the disease for the population screened, the sensitivity and specificity of the particular test. Doctors Mark Ebell and Henry Barry at the Office of Medical Education Research and Development at Michigan State University describe it in more detail here. It’s not as straightforward as it might seem.
Dr. Gerd Gigerenzer with the Max Planck Institute for Human Development in Berlin recently asked 1,000 gynecologists what the chances a women with a positive mammogram actually has breast cancer. He provided the doctors with the statistics they needed for the women in their region: 1% prevalence of breast cancer (the probability a woman has cancer); 90% sensitivity of breast mammography (the probability a woman with breast cancer will test positive) and a 9% false-positive rate (the probability that a women without breast cancer will test positive).
Only 16% of the doctors could give the correct answer. Most doctors “grossly overestimated the probability of cancer, answering 90% or 81%,” Dr. Gigerenzer said. In fact, the number of doctors who got the answer right was slightly less than chance (21%)!
The correct answer: “Only one out of ten women with a positive mammogram will actually have breast cancer.” How many women are ever told that? How many women are just handed their mammography appointment card and told they need it with no discussion?
Dr. Gigerenzer and colleague Dr. Ulrich Hoffrage gave a similar test on four different diagnostic tests to four dozen German doctors and published their results in a 1998 issue of Academic Medicine. When the information on the tests was presented as probabilities, just like they typically are [above], only 10% of the doctors could correctly estimate the probability of cancer being present!
So, the researchers reframed the health statistics, in a more natural logical progression. For example: Ten out of every 1,000 women who get routine screening have breast cancer; of these 10 women with cancer, 9 will test positive; of the remaining 990 women without cancer, about 89 will falsely test positive. Then, more of the doctors got it right. They concluded:
This study illustrates a fundamental problem in health care: Many physicians do not know the probabilities that a person has a disease given a positive screening test—that is, the positive predictive value. Nor are they able to estimate it from the relevant health statistics when those are framed in terms of conditional probabilities, even when this test is in their own area of specialty.
Most health information and statistics given to doctors and patients, and even published in medical journals, are reported in ways that “suggest big benefits of the featured interventions and small harms,” said Dr. Gigerenzer and colleagues at Max Planck Institute in the current issue of Psychological Science in the Public Interest. The lack of understanding of how to interpret health statistics and probabilities has critical consequences for people when trying to make healthcare decisions and in the advice given to them by their doctors about what to do.
Yet, patients and doctors alike, aren’t aware of this problem and it continues to cause undue fear among the public, they said.
Everyone who participates in screening should be informed that the majority of suspicious results are false alarms. We face a large-scale ethical problem for which an efficient solution exists yet which ethics committees… have not yet noticed… Without understanding the numbers involved, the public is susceptible to political and commercial manipulation of their anxieties and hopes, which undermines the goals of informed consent and shared decision making.
Health statistics and screening test information generate many false alarms and to avoid unnecessary anxiety, consumers deserve to be fully informed about their real risks for developing a disease and what a positive or negative screening test means for them. With understanding of the actual risks of disease comes “hopes and anxieties are no longer as easily manipulated… and citizens can develop a better-informed and more relaxed attitude toward[s] their health.”
There are other medical ethical issues to screening that aren’t always considered. Earlier this year, we looked at the ethics of newborn screening and the truly tragic consequences to innocent young lives from screening tests that were advanced without sound evidence.
An often neglected ethical dimension to screening, said Dr. Gates, is how screening tests represent a fundamental shift in the doctor-patient relationship. They are done on asymptomatic people, not patients coming to their doctor wanting medical intervention:
In ordinary medical practice, the patient initiates an encounter because of a troubling symptom. The physician pledges to help but can make no guarantee and is not responsible if the symptom turns out to represent something beyond the ability of current medical practice to cure. By contrast, a screening test is usually initiated by the physician (or indirectly, by professional or advocacy groups) and, in this situation, there is an “implied promise” not just that the screening procedure might be beneficial, but that it is in fact beneficial, that it will do more good than harm.
That isn’t always the case.
© 2008 Sandy Szwarc