For those of you who missed it, Gary Taubes wrote the cover story for the New York Times Magazine this past Sunday on the foibles of observational or epidemiological studies. He made all of the points that I have made in past posts about the inability of these studies to prove anything. And he elaborated on how the press misrepresents these studies and makes much more of them than they are worth.
Gary uses the hormone replacement therapy (HRT) fiasco as a case in point of how observational studies are misused. In the 1960s women began using HRT to relieve the many symptoms of menopause. Then in 1985 the authors of the Harvard Nurses’ Health Study released a report that women using HRT had only a third as many heart attacks as women who didn’t. This paper electrified the press, which reported it widely. Doctors began prescribing estrogen to women as a preventative against cardiovascular disease. Almost 20 years later studies began to emerge showing that HRT actually increased the risk for heart disease, blod clots and stroke. Now, many women entering menopause are avoiding using HRT for the one thing for which it is an appropriate treatment: the short term (several years) relief of menopausal symptoms.
Many explanations have been offered to make sense of the here-today-gone-tomorrow nature of medical wisdom — what we are advised with confidence one year is reversed the next — but the simplest one is that it is the natural rhythm of science. An observation leads to a hypothesis. The hypothesis (last year’s advice) is tested, and it fails this year’s test, which is always the most likely outcome in any scientific endeavor. There are, after all, an infinite number of wrong hypotheses for every right one, and so the odds are always against any particular hypothesis being true, no matter how obvious or vitally important it might seem.
As discussed many times on this blog, basically what happens is this: the researchers release the results of an observational study. The press picks up on the conclusions and makes hay with them. A later double-blind, placebo-controlled study shows that the hypothesis generated by the observational study was wrong. Then the press picks up that study and makes hay with it. It’s like the old (very true) saying that no matter what the outcome of a legal dispute, the lawyers always win. In the case of medical studies, the press always wins.
How many times have you seen articles with the headline saying something along the lines of Substance A shown to increase the risk of cancer? Everyone rushes to avoid Substance A, manufacturers remove it from their products, and its name becomes mud. Then a few years later comes the headline: Researchers at such and such university show that Substance A doesn’t cause cancer.
Why the flip flop? Because the first study, the one showing that Substance A causes cancer, was an observational study. The second, the one absolving Substance A, was a double-blind, placebo-controlled study.
It’s axiomatic in the research community that observational studies can lead only to the development of hypotheses. In our example above, the observational study led to the hypothesis that Substance A causes cancer. But when that hypothesis was tested, it was found wanting.
The authors of these observational studies and the journals that publish them love to issue press releases implying that there is causation when they know better. And the press loves to pick up on these releases and publish them as fact.
The catch with observational studies like the Nurses’ Health Study, no matter how well designed and how many tens of thousands of subjects they might include, is that they have a fundamental limitation. They can distinguish associations between two events — that women who take H.R.T. have less heart disease, for instance, than women who don’t. But they cannot inherently determine causation — the conclusion that one event causes the other; that H.R.T. protects against heart disease. As a result, observational studies can only provide what researchers call hypothesis-generating evidence — what a defense attorney would call circumstantial evidence.
Testing these hypotheses in any definitive way requires a randomized-controlled trial — an experiment, not an observational study — and these clinical trials typically provide the flop to the flip-flop rhythm of medical wisdom. Until August 1998, the faith that H.R.T. prevented heart disease was based primarily on observational evidence, from the Nurses’ Health Study most prominently. Since then, the conventional wisdom has been based on clinical trials — first HERS, which tested H.R.T. against a placebo in 2,700 women with heart disease, and then the Women’s Health Initiative, which tested the therapy against a placebo in 16,500 healthy women. When the Women’s Health Initiative concluded in 2002 that H.R.T. caused far more harm than good, the lesson to be learned, wrote Sackett in The Canadian Medical Association Journal, was about the “disastrous inadequacy of lesser evidence” for shaping medical and public-health policy. The contentious wisdom circa mid-2007 — that estrogen benefits women who begin taking it around the time of menopause but not women who begin substantially later — is an attempt to reconcile the discordance between the observational studies and the experimental ones. And it may be right. It may not. The only way to tell for sure would be to do yet another randomized trial, one that now focused exclusively on women given H.R.T. when they begin their menopause.
The take-home lesson from all this is that correlation does not mean causation. Just because obese people wear bigger belts doesn’t mean that bigger belts cause obesity. Just because there are more cops in high-crime areas doesn’t mean that cops cause crime. Just because you see more umbrellas when it’s raining doesn’t mean that umbrellas cause rain. But that’s the way these things would be reported in observational studies.
I’m probably beating this issue to death, but it’s really important that the limitations of observational studies be understood. And sometimes showing the farcical results of a ridiculous study allows people to understand that the results of a not-so-ridiculous-seeming observational study are still as farcical.
Let’s say I want to look at the belt sizes of obese people and correlate them to weight. I get a couple of hundred (or a couple of thousand; the numbers make no difference in observational studies) obese volunteers. I write down the belt sizes and the weights of all my subjects. I run this data through the statistical program on my laptop, and I discover that there is a direct correlation between belt size and weight. And not only is there a correlation, but the correlation is highly statistically significant. I publish my results and issue a press release pointing out my results and implying that belt size is not just correlated with obesity, but that belt size causes obesity. The press picks up on the story and publishes it under the headline: Belt size may cause fatness. Then unwitting overweight people rush out to buy smaller belts in an effort to treat their obesity.
Ridiculous as this sounds, it is no different than the observational study I reported on recently about red meat increasing the risk of death for victims of stage III colon cancer. No difference whatsoever.
As Taubes points out, in observational studies even the way the questions are posed can lead to different outcomes.
Even the way epidemiologists frame the questions they ask can bias a measurement and produce an association that may be particularly misleading. If researchers believe that physical activity protects against chronic disease and they ask their subjects how much leisure-time physical activity they do each week, those who do more will tend to be wealthier and healthier, and so the result the researchers get will support their preconceptions. If the questionnaire asks how much physical activity a subject’s job entails, the researchers might discover that the poor tend to be more physically active, because their jobs entail more manual labor, and they tend to have more chronic diseases. That would appear to refute the hypothesis.
So what should you do when confronted with an observational study appearing to show a correlation that is applicable to you? Let’s say you follow my advice and don’t worry about your intake of saturated fat, and then you read the press report of an observational study implying that eating saturated fat will increase your risk for colon cancer, what do you do?
So how should we respond the next time we’re asked to believe that an association implies a cause and effect, that some medication or some facet of our diet or lifestyle is either killing us or making us healthier? We can fall back on several guiding principles, these skeptical epidemiologists say. One is to assume that the first report of an association is incorrect or meaningless, no matter how big that association might be. After all, it’s the first claim in any scientific endeavor that is most likely to be wrong. Only after that report is made public will the authors have the opportunity to be informed by their peers of all the many ways that they might have simply misinterpreted what they saw. The regrettable reality, of course, is that it’s this first report that is most newsworthy. So be skeptical.
If the association appears consistently in study after study, population after population, but is small — in the range of tens of percent — then doubt it. For the individual, such small associations, even if real, will have only minor effects or no effect on overall health or risk of disease. They can have enormous public-health implications, but they’re also small enough to be treated with suspicion until a clinical trial demonstrates their validity.
If the association involves some aspect of human behavior, which is, of course, the case with the great majority of the epidemiology that attracts our attention, then question its validity. If taking a pill, eating a diet or living in proximity to some potentially noxious aspect of the environment is associated with a particular risk of disease, then other factors of socioeconomic status, education, medical care and the whole gamut of healthy-user effects are as well. These will make the association, for all practical purposes, impossible to interpret reliably.
This entire long, well-written article is worth taking the time to read. The part about the healthy-user effect is especially interesting. And the next time you read a report in the paper about the findings of an observational study, ignore it.