What is a fair test?
A clinical intervention trial on people has long been considered the gold standard in research, and essential for determining the effectiveness and safety of a treatment, pill or intervention. But all studies are not created equal and some are of such poor quality that we cannot rely on them to make health decisions, either for ourselves or for others. Yet, the very idea of questioning a clinical trial or the basic skills to recognize a biased study from one that can be trusted are uncommon today. Certainly, those trying to promote a treatment — be it a pharmaceutical drug, preventive health intervention, or alternative modality — aren’t keen on such skills and scrutinizing studies too closely.
But, it's easy to think about a clinical trial. The skill is amazingly like old-fashioned common sense.
Examining ‘bias’ in a clinical trial is a way to see if the study was a “fair test” of a treatment. When reputable scientists talk about a weak study that’s of poor quality, they’re not making an arbitrary claim or because they’re part of some conspiracy to discredit a study they don’t like, but because the study has failed to follow basic elements of the scientific process, proven for centuries to lead us to the soundest decisions.
It’s easy to design a trial, intentionally or unintentionally, that is biased and increases the likelihood the findings will support an hypothesis — we see it all of the time. Without exampling every type of bias, a few examples will help to illustrate bias in clinical studies on humans and help us recognize it. These examples also show why having these critical thinking skills are in our best interests.
To avoid ‘stacking the deck’ and be a fair test, a clinical trial first has to avoid bias in making comparisons.
Allocation bias of study participants. To test their treatments, researchers can select people for an intervention group who are different from those in the control group, making it impossible to know if the different outcomes were really from the treatment or because the people differed. For example, patients can be carefully screened for the treatment arm to be healthier, younger or higher socioeconomic status and then be compared to a group of people who are hospitalized, suffering from serious illnesses, older or lower socioeconomic status. This allocation bias can make a treatment appear to be safer and more effective than it is. [Remember the carefully screened, healthy patients for bariatric surgery compared with older, hospitalized sick people?]
There are also countless ways that two groups of people can differ— many of those differences we cannot measure, guess or anticipate — that can influence a treatment’s effects. The only way to ensure that two groups of people are likely to be similar, and that all of those unmeasured confounding factors will be more likely to happen equally among each group, is to randomize the participants: People randomly selected to receive the treatment and randomly selected for the comparison control group.
So, adequate procedures for randomization help to make a fair test.
Attrition bias of study participants. A treatment cannot be fairly tested if large numbers of people drop out of the study, leaving only information on a select group. We don’t know if those not included had died, suffered side effects or had other complications of the intervention that made them drop out. Attrition bias can make a treatment appear to be safer and more effective than it is. It’s impossible to know if the rosy picture presented by the remaining study participants is because most of the negative results weren’t included.
Similarly, studies that fill in the missing data by carrying forward the last available measure of the people who’d dropped out, can make it an unfair test. [Diet studies that take the 6-month weight loss among those who dropped out as having been maintained to the end of the study, for example, rather than accurately reveal their rebound weight gain, can make a diet appear more effective.]
So, low attrition rates and reporting on all study participants help to make a fair test.
Study duration bias. When a study is too short in duration, or is prematurely stopped, before the full effects are realized, the treatment cannot be adequately evaluated. Ending a study as soon as positive results are seen but before complications set in can make a treatment appear safer and more effective than it actually is. [We commonly see weight loss studies designed to be short-term, well under the minimum 5 years when weight regain is realized that the FTC has said is necessary to evaluate weight-loss efficacy.]
Italian researchers reported just last week in the Annals of Oncology, that the number of clinical trials being stopped early, just as they start to show a benefit, has increased dramatically over recent years. Of the 14 randomized controlled clinical trials published in 2005-7 that had been prematurely stopped when they’d begun to show a benefit, 79% were used to support an application for marketing approval (through the Food and Drug Administration or European Medicines Agency). “This suggests a commercial component in stopping trials prematurely,” said one of the authors. While rushing a potentially life-saving drug to market is the rationale for ending some studies early, it risks overestimating the treatment’s effects, and too many are being stopped early without strong evidence of superior benefits. Of special concern was that 5 of the studies hadn’t even enrolled 40% of the number of patients planned in the protocol as needed to make a credible statistical analysis for effectiveness. “If a trial is evaluating the long-term efficacy of a treatment,” they wrote, “short-term benefits, no matter how significant statistically, may not justify early stopping. Data on disease recurrence and progression, drug resistance, metastasis, or adverse events, all factors that weight heavily in the benefit/risk balance, could easily be missed.”
So, studies of sufficient duration help to make a fair test.
Intervention bias. Rather than a placebo, researchers can compare a treatment to another treatment known to be less effective, ineffective or dangerous; or compare their intervention to a competitor drug at a lower, untherapeutic dose. Intervention bias can make a treatment appear more effective or safe than it really is. While an argument can be made in life endangering conditions, such as cancer, of comparing a new treatment to the best standard care, in preventive health trials, this is not a credible rationale.
So, interventions compared to a placebo help to make a fair test.
Outcome measures. How the primary outcomes (results) are defined and measured can intentionally, or unintentionally, make a treatment appear effective when it isn’t, or lead us to mistakenly believe a treatment is more beneficial than it actually is. Using credible, verifiable and quantifiable measures that accurately reflect actual clinical outcomes is the only way to show if a treatment helped, harmed or was useless. Surrogate endpoints, such as health risk indices, for example, can give false impressions of actual health benefits that never materialize clinically.
Incidences of disease conditions, such as diabetes, can be manipulated by how the disease is defined, measured (self-reported, using lab blood sugars, billing records, or clinically confirmed), and reflect the level of screening between groups. Measuring only incidences of heart attacks or cardiovascular deaths, for example, without reporting all-cause mortality can make an intervention appear more beneficial than it really is. No one wants to learn later that a treatment “worked” but more people died from cancer or other causes. With trials of long-term preventive health interventions purporting to prevent chronic diseases and improve health, the most objective and irrefutable endpoint is always all-cause mortality.
Using subjective symptoms as endpoints, such as how people report feeling, rather than actual clinical measures, can also impart bias.
So, valid and objective primary endpoints that are clinically meaningful, quantifiable and confirmable measures help to make a fair test.
Observer bias. If those participating in the study or those observing and measuring the outcomes know which people are getting the intervention, especially if its new or believed beneficial, they are more likely to interpret observations or experiences as being due to the treatment. [Remember the nonblinded study of miracle thigh cream?] When the subjects and clinical researchers are blinded and don’t know which participants are getting the treatment, it can help to make the observations more objective. When that’s not possible, independent observers can be blinded.
So, double-blind trials can help to make a fair test.
Analyses. Inappropriate statistical analysis methods and sufficient sample size is an entire topic of its own, but briefly, can overstate the results or significance of a study’s findings. This problem is so pervasive [article upcoming], that outside independent statistical analyses for all clinical trials before the results are published were recently recommended by JAMA editors.
So, appropriate statistical analysis can help to make a fair test.
Publication bias. When practitioners look at the body of available evidence to evaluate the effectiveness of a treatment, ‘reporting bias’ can lead to inaccurate conclusions that a treatment is more effective or safer than it really is. Trials that show a treatment to be ineffective, or to not support the researchers’ original hypothesis, aren’t reported or published as often as those reporting successes. Negative or null results are less likely to be presented at scientific meetings or published in journals promptly, in full reports, in widely-read journals, in English, and in more than one report; and they are less likely to be cited in reports of later studies. While underreporting of research is not limited to commercial conflicts of interests, it compromises fair testing of treatments.
The importance of careful and complete reviews of existing evidence before beginning a clinical trial on human subjects also has an ethical component for the welfare of the human volunteer participants, to ensure that the intervention to be tested is biologically plausible, and that there are enough laboratory, observational and animal tests to suggest a benefit for people. A 2001 review by the Cochrane Collaborative Stroke Group made a tragic discovery after 7,665 patients had participated in 29 clinical trials for calcium antagonists for strokes, which were later recognized as clinically ineffective. The animal studies had never supported the possibility that the drug would have been helpful in humans to begin with. Worse, the published trials showing favorable results had been biased and poor quality, in contrast with methodologically stronger unpublished studies that had found negative effects.
Relying on surrogate outcomes and failing to adequately review the medical literature can have disastrous consequences, said Dr. Iain Chalmers with the UK Cochrane Centre, NHS Research and Development Programme in Oxford, in a review of controlled trials in cardiovascular medicine. Drugs shown to reduce arrhythmias during heart attacks were accepted as standard therapy even though an earlier systematic review of controlled trials had warned of the uncertainties on mortality. The drugs “continued to be used for nearly a decade,” he wrote. “At the peak of their use in the late 1980s, it has been estimated that anti-arrhythmic drugs were causing between 20,000 and 70,000 premature deaths every year in the United States alone.” It took more than 50 controlled trials of the anti-arrhythmic drugs before practitioners accepted their lethal capacity, he said. Worse, an unpublished study in 1980 had found 9 times the deaths in the treatment arm over placebo but hadn’t been published and doctors lost its potential early warning.
So, systematic reviews of the evidence by researchers that include all relevant, quality trials, published and unpublished, can help to make a fair test.
It’s the real thing
Biased studies and biased reviews, based on these and other forms of bias, most often tend to favor whatever intervention being studied or promoted/marketed. Imagine that. Critically examining interventions that are made to appear slam dunks is, perhaps, most important of all.
As essential as understanding bias is for helping us to better recognize what's real and keep from being taken in by unsound information, such critical thinking skills (aka the scientific process) aren’t understood by politicians or bureaucrats who decide health policies, employers and benefit administrators who institute health risk assessments and ‘wellness’ programs, and managed care company (formerly called a health insurance company) executives who manage your healthcare and decide on covered benefits.
Most worrisome, some people actually reject the notion of bias and sound clinical trials. The scientific process is precisely the type of curriculum that has been dissed in academia and medical and nursing schools increasingly moving towards “other ways of knowing” and experiential learning.
No discussion of bias in clinical research is complete without mentioning meta-analysis. Meta-analysis is a technique of lumping together all the available evidence together, in hopes of helping practitioners understand what the body of evidence suggests about the efficacy of a treatment. It is most often used when there are no large, high quality, randomized, double-blind, placebo-controlled clinical trials — the gold standard — to prove the validity of a treatment or theory. A meta-analysis can hope to create a statistically stronger estimate of an effect than any single study might have been able to show. There’s bias in meta-analyses, too. In fact, a significant percentage of meta-analyses are biased and their findings are later contradicted in large randomized controlled trials, according to Drs. Matthias Egger, M.D., and George Davey Smith, M.D., D.Sc., at the University of Bristol, UK, in a 1995 issue of the British Medical Journal. “Funnel plots” can help to identify bias in meta-analyses. In making a funnel plot, the estimated size of treatment effects in the studies used are plotted against the sample size. They’re based on the fact that precision in estimating the treatment effect increases as the sample size of component studies increases. Results from small studies will scatter widely at the bottom of the graph, with the spread narrowing among larger studies at the top. “If there is no publication bias, the plot should resemble a symmetrical inverted funnel [or pyramid],” they said. [Image from: Antioxidant supplements for prevention of mortality in healthy participants and patients with various diseases. Cochrane Database of Systematic Reviews 2008] In a 1997 article, using funnel plot asymmetry, Dr. Egger and colleagues looked at how widespread publication bias was in meta-analyses. Examining 37 meta-analyses published in leading medical journals from 1993-6 and 38 meta-analyses from the Cochrane Database of Systematic Reviews 1996 second issue, they found evidence of bias in 38% of meta-analyses in medical journals and in 13% from Cochrane. Cochrane standards work to minimize bias. Small positive trials are more likely to be published than negative ones, positive studies more often published more than once, English studies are published more often than foreign language ones. “Smaller studies are, on average, conducted and analyzed with less methodological rigour than larger studies,” Dr. Egger and colleagues said, and trials of lower quality also tend to show the larger effects. All of these distort the findings of a meta-analysis. Smaller studies on higher risk patients can also bias meta-analyses. While randomized controlled trials are the best way to appraise the effectiveness of a treatment, “results of meta-analyses that are exclusively based on small trials should be distrusted — even if the combined effect is statistically highly significant,” they cautioned. “The appearance of misleading meta-analysis is not surprising considering the existence of publication bias and the many other biases that may be introduced in the process of locating, selecting, and combining studies,” Dr. Egger said. As has been covered before, there are many quality problems inherent in meta-analyses. One problem is that the studies clumped together can vary widely in quality, measures, populations, methodologies and statistical analyses (with findings that are positive, null and negative). In interpreting their conclusions, most researchers recognize that any relative risks reported by lumping such divergent studies together must be beyond chance, and computer modeling errors attempting to account for all of the discrepancies among the studies — more than 200% above null — to be credible. Still, at least two-thirds are of such exceedingly poor quality they cannot even be used to guide clinical practice, according to an evaluation of 139 meta-analyses published in the journal Critical Care. They cautioned doctors to think carefully before even considering applying the results of meta-analyses in their practices.
Bias in reviews: the meta-analyses
© 2008 Sandy Szwarc
<< Home