Virtually all of the results presented in medical studies are displayed as ‘average’ or ‘mean’ values. I’m sure everyone knows how to come up with an average or mean (the two are synonymous) value for a group of data points is to add them and divide the sum by the number of data points analyzed. For example, if you are a teacher, and you want to find out the average score on a test you gave to 30 students, you would add all the test scores together and divide by 30. You would then have the ‘mean’ or ‘average’ score of the students in your class.
Most medical papers list the mean values of whatever is being studied. If the researchers are trying to determine whether or not an experimental weight-loss therapy works, they add the weight lost by all the subjects participating in the study then divide by the number of subjects. The number they get is the ‘mean’ or ‘average’ weight loss brought about by the therapy being tested. It all sounds pretty reasonable and scientific, but is it really?
It would be realistic if we were all average people. But we’re not. And averages don’t represent us all that well. In fact, if you think about it, the average American would have one breast and one testicle.
Averages don’t always represent the true findings in a scientific experiment, either. Let’s look at an example to demonstrate. Let’s say we are testing a new weight loss regimen on 10 people. We start these people on the program, keep them on it for three months, then evaluate. When we look at the numbers we find the following results:

  • Subject #1 -4 lbs
  • Subject #2 -5 lbs
  • Subject #3 – 7 lbs
  • Subject #4 – 4 lbs
  • Subject #5 – 2 lbs
  • Subject #6 -6 lbs
  • Subject #7 +12 lbs
  • Subject #8 – 1 lb
  • Subject #9 + 4 lbs
  • Subject #10 -3 lbs

If you add these numbers up and divide by 10 you find that the average or mean weight loss for the group is 1.6 pounds, which doesn’t seem like a lot. But if you look at the data itself instead of the average, you see that most of the people lost around 4-5 pounds. In fact, assuming these numbers to be accurate, if you went on this same regimen, the odds are that you would lose 4-5 pounds instead of the 1.6 pounds that the ‘mean’ of the data would predict. You could also gain 12 pounds, but that would be unlikely (a 1 in 10 chance). This is the problem in simply looking only at average values and not the data as a whole.
Another way to look at this data is to calculate the median, which is basically the midpoint within the data set, i.e., the point at which half of the subjects are above and the other half below. You can do this by arranging the data in ascending or descending order and finding the midpoint by lopping off from both the top and the bottom until you get to the middle.
If we do this to our data, it looks like this:

  • Subject #7 +12 lbs
  • Subject #9 + 4 lbs
  • Subject #8 – 1 lb
  • Subject #5 – 2 lbs
  • Subject #10 -3 lbs
  • Subject #1 -4 lbs
  • Subject #4 -4 lbs
  • Subject #2 -5 lbs
  • Subject #6 -6 lbs
  • Subject #3 -7 lbs

We can see that the median falls between 3 and 4 or 3.5. So in this experiment the mean (average) weight loss is 1.6 pounds, the median is 3.5 pounds, about twice what the mean is. But if you look at the actual weights lost you can see that most clustered around the 4-5 pound level. You can see that the results of our experimental weight-loss regimen look different depending upon how they’re reported.
It works this way in the real medial literature as well.
Let’s look at a study from a couple of years ago (full text pdf) that many people have used to ‘prove’ there is no metabolic advantage and to ‘prove’ that a calorie is simply a calorie irrespective of its macronutrient composition.
Boden et al studied 10 overweight patients with type II diabetes in metabolic ward for 21 days. During the first 7 days the subjects were allowed to follow their regular diet (the control) and were switched to a low-carbohydrate diet (21 g carb/day) for the next 14 days. During the course of the study, numerous parameters were evaluated, including fasting glucose levels, insulin sensitivity, HbA1c, weight change, caloric intake and energy expenditure.
After the 14 days on the low-carb diet subjects lost weight and markedly improved in all parameters measured. Blood glucose levels normalized, HgbA1c decreased from 7.3% to 6.8%, insulin sensitivity improved by about 75%, triglycerides dropped by 35% and total cholesterol fell by 15%. Pretty dramatic results for only 14 days on the low-carb regimen, I would say.
The subjects lost weight as well. They lost an average (mean) of 3.63 lbs (1.65 kg) while decreasing their food intake from 3111 kcal/day to 2164 kcal/day, a decrease of 947 kcal/day. Multiplying 947 X 14 gives us a total caloric deficit of 13,258 kcal. If we divide this number by 3500, the kcal in a pound of fat, we get 3.79, which is the amount of fat that should be lost simply from the caloric deficit. And which is pretty close to the actual 3.63 lbs actually lost. The difference is insignificant, so it really is, as they say, close enough for government work.
The authors of the study conclude that irrespective of all the other markedly positive benefits of the low-carb diet, the “weight loss…was completely accounted for by reduced caloric intake.” In other words, there is no metabolic advantage to a low-carb diet. Weight lost simply occurs because the satiating effects of the low-carb diet bring about a spontaneous reduction in calories.
But is that all the story here? Not really. Let’s see why.
Below is a chart from the study showing graphically what happened to caloric intake and weight during the course of the experiment.
boden-chart-only.jpg
Look at the lines in the upper half of this chart that represent the body weight ranges of all the subjects and the lines representing the caloric intake of all the subjects. Notice that the lines for caloric intake are small while the lines representing the amount of weight loss are large. If you compare the dimensions of these lines to the scale, you find that these subjects varied their caloric intake by about only 200-250 kcal/day. But the variation in weight loss is much, much larger, which means that some subjects lost considerably more weight than would be expected from the caloric deficit while others didn’t lose as much or may have even gained a little. What this chart shows us is that there is indeed a metabolic advantage for some of these people even though on average there wasn’t for the group. And remember the post on Carl Popper: if the metabolic advantage can be shown to be present, that means the hypothesis that a metabolic advantage doesn’t exist is false.
Unfortunately, most medical articles don’t show the array of data as this one has, so all we have to go by is the average, which doesn’t always tell the whole story. One of the things I like about my favorite journal, Nutrition & Metabolism, is that the editors almost always require the raw data to be shown along with the averages, which truly allows astute readers of the medical literature to come to meaningful conclusions about the data. Only after they report it in this way, can you really decide what happened.

22 Comments

  1. I’d just like to throw in, speaking as someone who teaches economics and statistics, that just from a stats point of view it is nearly always misleading to report these studies in terms of means. The mean is a lousy measure of central tendency if there are extreme values, and there are almost always outliers in these distributions. My casual (very very casual!) observation is that you can’t even argue that the outliers are balanced and thus cancel each other out. I really fail to see why, if you’re looking for descriptive results, researchers don’t report at least the median.
    (My other stats pet peeve is drawing conclusion from extremely small samples from populations of unknown underlying distribution, but let’s not even go there….)
    It’s nearly always misleading but that’s the way they’re almost always published. I agree with you on the drawing conclusions from small samples, too. I may post on that later. I’m glad you’re a reader. Maybe you’ll keep me from making any huge blunders since my statistics have all been self taught.

  2. Personally I think it’s because researchers are graphically illiterate, and back when they studied charts, averages was about it. They are not thinking: what is the easiest, clearest way to show people graphically what really happened? They are thinking: must have a graph, will show averages.
    All it would take to turn it around would be to read Tufte’s books on data graphics and apply the principles. William Horton also has good books on visual literacy.
    Especially with such small-sample studies, a simple scatter chart could show so much more – how much divergence is there, actual values, numbers of individuals at each result point, etc etc

  3. You take a mean, yes there is room for distorting population members, especially if your n is small. The further below 30, the more skew you will get. But, don’t most studies also include standard deviations along with averages, and P values, which would give the astute reader (not to be confused with your average health and nutrition journalist) plenty of information without presenting the whole set of data. If you did a robust study, with say an n of 200+, this would be plenty of info, and wading through 200 individual case records with multiple factors reviewed, might be very time consuming and sticky for a critical reader, whereas a report including averages, sigma values, p-values and perhaps even the median, would be completely adequate.
    My stats professor had a big beef with the median. But he was a hard core Republican. He would say things like, “These liberals will try to tell you that the median income is X like that means anything.” I tried to explain that a big gap (downward) between median and mean income might suggest some Bill Gates and Sultan of Brunei types moving the mean up, and distorting it. Guy is a brilliant finance and stats guy, but this apparently caused him some cognitive dissonance. He was also very fond of the phrase “good enough for government work”.

  4. I have a little problem with the graph in the upper half, specifically with the error bars… with such a small sample, sometimes it’s more meaningfull to show the individual changes rather than grouping all the body weights in one spot. Just by looking at the lines below the dark circles one can see that there is not real significance between points. One could mistakenly conclude that there there is significant difference between day 1 and day 14 based on how the error bars overlap in all points. Even in the caloric intake, at least during the first few days, one can see that the error bars overlap. There are a number of ways in which one can plot graphs that show individual change. Perhaps that is also a reflection of how inappropriate is to use the mean as the descriptive number. Fourteen days isn’t really too long and if we consider adaptation time (which Phinney estimates as being around 6 weeks!), which can be different between individuals, then the error bars may not overlap as the time of the treatment increases.
    On the caloric intake part of the graph, however, using the same reasoning one can see that the points after day 7 as a group are, indeed, significantly lower from the points before day 7, which could also tell a little something about the adaptation period on the this, otherwise too short experimental time.

  5. Excellent post.
    BTW Michael, have you read The Black Swan? I highly recommend it. Nicholas Nassim Taleb talks a lot about how meaningless, er, means are in many cases — he talks about how certain variables are often modeled on the assumption that they are normally distributed, when they anything but. I am convinced that there are many applications of his ideas in nutrition and biology. For example, Art de Vany has talked a lot about this.
    Excellent, excellent book. He may not be right on everything, but he has managed to irritate the hell out of the academics as well as the finance practitioners.
    I read it when it first came out and enjoyed it. I thought it was a little overwritten and it bogged down in places, but the info was great.

  6. “In fact, if you think about it, the average American would have one breast and one testicle.”
    I’ve always had two testicles, but since I’ve been overweight I’ve had two breasts as well. Does that make me above average?
    I suppose it does make you above average. 🙂

  7. You mentioned one metabolic ward study. Would you say this analysis applies to those other metabolic ward studies one could read about in a certain self-published tome on diet and exercise?
    It’s not one of those specific metabolic ward studies, but, yes, it does apply.

  8. My nit pick of the study is their description of the control diet as HIGH FAT, HIGH PROTEIN, AND HIGH CARBOHYDRATE. It’s like Garrison Keilor’s description of Lake Woebegon where all the children are above average.
    By my calculations, Control diet; 1156 cal from carbohydrates, 548 cal from protein, 1386 cal from fat for a total of 3090 calories which is in line for people weighing an average of 251 lbs.
    This gives 37 % cal from carbs, 18% cal from protein, 45% cal from fat.
    Compared to the USDA diet, it is already a low carb, high fat diet, albeit compared to Atkins or Protein Power, a reduced carb rather than an actual low carb diet.
    As to the main point of your piece which is that half the people lost 4 lbs or more (in only 2 weeks!), there seems to be an emerging consensus that the degree of metabolic advantage is dependent on insulin levels. Elevated insulin levels yield increased metabolic advantage. See Taubes as to explanation. Thanks for the time and space to comment.
    P.S. If a drug company came out with a pill that gave significant weight loss in only half the population, it would be hailed as a scientific breakthrough.

  9. Sorry Doc, I can’t let you get away with this one.
    I’m a big fan and it’s because you don’t (usually) try to pull this kind of trick.
    The “Daily Body Weights and Energy Intakes” graph is a misleading representation of the data. The left hand scale (Body weight, kg) doesn’t start at zero. It only shows a range from 108 to 120. This is fine as long as one keeps in mind that it shows changes only relative to other changes, not relative to the entire value (or body weight in this case). The right hand scale (Calorie intake, kCal / 24 hr) does begin at zero so it shows changes relative not only to other changes but also to the entire value.
    It is not an apples to apples comparison to compare the lines representing values on the left scale with lines representing values from the right scale and saying, “Look the lines for changes in body weight are much longer than the lines for changes in energy intake.”
    I’m not saying that there is no metabolic advantage, I believe there is because of other data that you’ve shown. But it isn’t appropriate to draw that conclusion from this graph.
    Regardless of their opinions of protein, carbs, and metabolic advantage, your readers have come to expect honest and ruthless evaluation of medical studies, showing flaws and questioning conclusion.
    And they deserve better than that chart.

  10. i have only one breast and one testicle and actually one foot but i do alright.
    I do however look a little deflated in a tux.
    Yours in balls,
    Simontly

  11. A little statistics humor:
    A statistician is a person who stands in a bucket of ice water, sticks his head in an oven and says “on average, I feel fine!”
    A statistician drowned while crossing a stream that was, on average, 6 inches deep.
    There are lies, darned lies, and statistical outliers.

  12. Although I agree with your overall point, I’m not sure you have interpreted the top graph correctly. You said, “But the variation in weight loss is much, much larger”–however, the left y-axis represents body weight, not weight loss. The standard error (or standard deviation, I don’t know what they’re using) is large in the beginning presumably because there was a wide variation in starting body weight. This variability in body weight could have continued to be the same even if every subject lost exactly the same amount of weight each day.
    Which goes to show that it would have been a lot more useful if they actually had at least shown the mean of the weight lost, rather than the mean of the body weight!

  13. http://www.timesonline.co.uk/tol/life_and_style/food_and_drink/article3559318.ece
    Have a look at the above for obvious reasons. Saatchi is insanely wealthy…advertising first with his Bro and latterly as a collector of art.
    But the eggs have it on this one.
    As bizarre as it sounds i could eat nowt but eggs with a smattering of veggies and speeces.
    All good things and again the Rad or Fad post on IF was rooty tooty.
    Rock not Pop, Sister.
    Sinc.
    In Protein Power I wrote about a New England Journal of Medicine article describing how an elderly man in a nursing home subsisted entirely on two dozen eggs per day without any ill effects and while keeping all his lipid levels normal. It can be done.

  14. Amy, above, is exactly right. The rapidly improving labs and acceptance by the patients certainly validate LC but differentiating metabolic advantage of macronutrients seems like a trivial pursuit. For me, the most remarkable finding was that these diabetic, obese patients quickly found just the right total Kcal without measuring or supervision. Bravo, mother nature and the VLCD Paleolitic diet!

  15. Hi Dr E,
    I’ve just now come across this post (hat tip: Sandy Szwarc) called The March of the Zealots. It’s pretty good. http://www.numberwatch.co.uk/zealots.htm
    Also, even more off-topic, the Multi-Ethnic Study of Atherosclerosis (MESEA) results are out.
    http://www.theheart.org/viewArticle.do?primaryKey=849845&nl_id=tho27mar08 (login required)
    A snip from the article:
    “The coronary artery calcium (CAC) score is a predictor of coronary heart disease not just in whites but also in blacks, Hispanics, and Chinese, a new analysis of the Multi-Ethnic Study of Atherosclerosis (MESA) study shows. Dr Robert Detrano (University of California, Irvine) and colleagues are the first to examine the relationship between the amount of coronary calcium and the incidence of coronary events in various ethnic groups; they report their findings in the March 27, 2008 issue of the New England Journal of Medicine.
    “Detrano explained to heartwire that although it is already known that the prevalence and extent of coronary calcification differ substantially among ethnic groups—for example, African Americans are known to have around 40% less calcification than whites—”what we didn’t know was whether, when there is calcification, it was as meaningful. We have shown that it is.”
    “His team found that a doubling of calcium scores increased the estimated probability of a major coronary event by around 25% in all the ethnic groups they looked at, over a follow-up period of almost four years—a measure they say adds “incremental” value to the prediction of coronary heart disease over and above standard risk factors. ”
    (Hat tip Dr Davis.)
    This is really, really strong evidence of the value of Calcium Scores. But will the boffins take notice??? (See above for unfortunate answer)
    Michael Richards
    Hey Michael–
    I just read this study this morning. If you read all the literature out there you will discover that calcium scores are very predictive of who will have a heart attack and who won’t. Those with calcium scores of zero have virtually no chance of having a heart attack (as long as they maintain a zero score) irrespective of what cholesterol levels are. In this very study there was no difference in cholesterol levels between those who developed heart disease and those who didn’t. And both groups had total cholesterol levels below 200 mg/dl, which should give the lie to the notion that cholesterol drives heart disease. Interestingly, of the group who developed heart disease, 28.4% were taking statins whereas only 16% of those in the group who didn’t develop heart disease were taking statins.
    Cheers–
    MRE

  16. And the follow-up question everybody probably has: do we have any idea what lifestyle choices affect CAC scores?
    Well, a couple of major differences between those who had heart attacks and those who didn’t were that those who did had significantly higher triglyceride levels and significantly lower HDL-cholesterol levels than those who didn’t. We know that restricting carbohydrates markedly reduces triglycerides and increasing fat intake markedly increases HDL-cholesterol. A low-carb diet is a higher-fat diet, so my recommendation is the low-carb diet.

  17. If you add these numbers up and divide by 10 you find that the average or mean weight loss for the group is 1.6 pounds, which doesn’t seem like a lot.
    Huh? When I add up the numbers, I get 48. Divided by 10, that’s a mean of 4.8 pounds lost. Right? Not 1.6.

  18. Never mind, read too quickly, I didn’t notice that a couple of the numbers represented positive weight gain.

  19. CAC and other calcification is now demonstrated to be connected with calcifying nanoparticles (nanobacteria). They somehow drag hydroxyapatite particles into the cells and form little igloos around themselves making them impervious to some form of modified tetracycline treatment (the only current “treatment”). They appear when the body is stressed in some way. Of course instead of figuring a way to prevent/alleviate the stresses (whether physical, mental or environmental) they have of course patented a new treatment.
    There is quite a lot of information here:
    http://www.nanobaclabs.com/content/what-are-cnps.htm
    Interesting. I’ll read it.

  20. Doctor, this study was not included in Colpo’s book.
    I’m not sure I understand your comment. If you are referring to the study that is the subject of this post, I don’t think I ever claimed that it was in his book.
    Cheers

Leave a Reply

Your email address will not be published. Required fields are marked *