1) 1956 Work on Using Statistics to Lie Holds Up Well
It was early in 2007 when I first heard about a 1954 book called How to Lie with Statistics, by Darrell Huff. Though I knew next to nothing about it aside from its title, that alone made me want to read it, since it seemed like it promised to be an entertaining work that I’d learn something from. While I was unable to get a copy of it then, I vowed that I would find and read it someday.
As it happened, it took over a decade for “someday” to arrive. But, in spring 2019, it finally did: at long last, I was able to read that work. And it was worth the wait, because it explains in a user-friendly manner how people can use statistics to mislead, and spells out how to tell when different graphs, facts, and figures are being used to deceive.
Contentwise, this text gives basic overviews of different ways of spreading misinformation with statistics, such as biased statistical samples, meaningless facts or data in advertising, and committing the post hoc fallacy (i.e., confusing correlation with causation) when making claims. The author also gives easy-to-understand explanations about how people can use these to fool others. As an example, its fifth chapter includes diagrams and real world examples (albeit ones from 1954) that effectively explain how bar graphs and line graphs can be misleading. Likewise, it largely avoids use of math except when absolutely necessary, such as when it goes over what mean, median, and mode represent in real life.
Also, this book gives insights about statistical trickery that are just as relevant in 2021 as they were in 1954. For example, in chapter 9 it makes clear that exact numbers (e.g. $197.22) are easier to believe than rounded ones (such as $200), even if they’re made up. As well, its final chapter emphasizes that, when looking at graphs or statistical figures, there are several very important questions to ask, including what they mean, how they were worked out, what is not included in the calculations, whether or not whoever reported them has an agenda, or if they make sense. A big one, though, is this: If the statistic in question is making a comparison between two things, is it comparing two related phenomena (such as factors related to TOEFL scores) or two unrelated ones (like the number of people who drown in swimming pools each year and how many films starred Nicholas Cage that year)?
However, How to Lie with Statistics was published in 1954, so statistics and instances it cites show their age, such as figures that are rather low (e.g. on page 13, the author writes that Yale graduates from 1924 made just $25,000 per year by 1954) and startling examples (such as ones in chapters 7 and 8 that have to do with smoking, which many Americans view negatively today). Even with such obsolete illustrations and numbers, though, its lessons are still easy to understand.
Happily, others who have written about this book have used its points and insights to come up with their own up-to-date graphs. One, who wrote an article in 2012 about it in Fast Company, uses two charts (in paragraph 9) that each show the same phenomenon (venture capital-style investing in mobile technology each year from 2005 to 2010) to demonstrate how the message a graph sends depends on the y-axis:
Another person, Will Koehrsen, gave several other examples in his article, including one that purports to show a connection between the divorce rate in the U.S. state of Maine and how much margarine Americans consume on average:
As well, that writer goes over how easy it is to use bar charts to display a misleading message (via an example of changes in interest rates between 2008 and 2012), and why it’s important to pay attention to the y-axis when looking at them:
Overall, How to Lie with Statistics is an entertaining and educational book that helped me understand better how statistical data can be used to mislead, and is well worth reading.
2) Learning How to Call Baloney on 21st Century Misinformation
Although How to Lie with Statistics is a highly informative work that is simple to understand, it also costs money to buy, plus its figures and examples are dated. The free website Calling Bullshit solves both of these problems, while expanding on what that book covered.
Originally offered in spring 2017 as a University of Washington class, this site goes over topics covered in How to Lie with Statistics, such as correlation and causation, misleading line graph axes, and questionable bar graphs. Also, it covers concepts related to bull, including what it is, how to spot it, how to debunk it, and ethics of calling it. Finally, it gives overviews of real-world issues related to humbug, such as problems with research, scientific misconduct, predatory academic journals, and fake news.
 For those who are not comfortable with profanity, the creators made a version of said site called Calling Bull that uses the word “bull” in place of “bullshit.” However, the lectures and several of the readings use the latter term, so users should be aware of this.
Not only that, but it gives quite a few examples of misleading statistics. One involves a 2015 graph that purports to show that there is a relationship between how old musicians are when they die and what music genre they perform in. Appearing in The Conversation in 2015, it is shown above:
Based solely on looking at this chart, it appears that, whereas musicians in genres like blues, jazz, and country live as long as other Americans, others (such as those in pop and rock) have shorter lifespans, with a few (particularly those in rap and hip hop) seeming to die in their 30s. Not only that, but an accompanying table suggeste that rap and hip hop artists have a markedly higher likelihood of being killed by homicide than singers in other genres, as shown below:
However, as the course’s creators explain, the graph’s conclusions are inaccurate for two reasons: a) Newer music genres (e.g. rock, metal, and hip hop) have not been around for a long time; and b) Mortality rates include only musicians who have died, not those still living, youths whose deaths are more likely to be from homicide or suicide than heart disease.
“This pattern reflects, to some extent, a confound in the data: musicians who are dying youngest belong to newer genres (electronic, punk, metal, rap, hip-hop) that have not existed as long as genres such as jazz, country, gospel and blues. Consequently, they have not had the same opportunity to live a full lifespan.”
Another example of deceptive statistics they go over (this one in lecture 6) is a graph from a news opinion blog that shows average worldwide temperatures from 1880 to 2015:
Looking at the graph itself, it appears that the temperatures have barely changed over the past 135 years, and that there is nothing to concern us about human-caused global warming (let alone that there is any kind of climate crisis). However, as the lecturer makes clear, this one is misleading because the y-axis has a very wide range (-10°F to 110°F) that minimizes real changes in average temperatures worldwide. He also says that a better graph of this data (with a narrower y-axis range that closely fits the data) looks something like this:
A third example of statistical trickery has to do with advertising, namely claims of hot cocoa being 99.9% caffeine free.
The course’s creators make the point that, by weight, regular coffee is also 99.9% caffeine free, meaning that this kind of assertion about hot chocolate doesn’t mean much.
Like a college class, this website includes a syllabus (along with links to recommended readings, such as Frankfurt’s 1986 paper “On Bullshit”), lecture recordings (broken up into 5-15 minute YouTube videos), tools for looking critically at graphs (among other things), and case studies involving real-world nonsense. However, all its material (including videos and subtitles) is in English, meaning that anyone who wants to complete it should have a strong command of academic English. Also, several lectures (including the third, fourth, and sixth ones) use some math and statistics, so knowing about those two topics helps when completing those lessons.
In addition, it is self-paced and user-controlled, and lacks homework, papers, and tests (to say nothing of how not even the readings or lectures are required), which makes it so that anyone who is interested in it can do as much or as little of it as they want, without being graded or otherwise judged. On the other hand, as it’s a website rather than a university course, it is impossible to get university-level credit for completing it (or otherwise doing it).
Finally, though it goes over a number of topics related to bull (such as how to recognize it, Occam’s Razor, and garbage in/garbage out), it is merely an introduction rather than an in-depth analysis. Not only that, but there are other subjects (like media sensationalism, clickbait, deepfakes, and conspiracy theories) that it either gives just a basic overview of or does not mention at all.
Overall, Calling Bull is a useful and informative introduction to misinformation, and serves as an update to How to Lie with Statistics.
Matt Ehlers has taught math in Tanzania as a Peace Corps Volunteer, has taught English in South Korea and Nicaragua, and has worked as a tutor for college students in Oklahoma, and is currently an MA TESOL candidate at SIT Graduate Institute. In his free time, he enjoys reading, history, photography, and learning about cognitive psychology.