Every high school student in America is taught this simple formula:
This is the mean, or average, of a set of numbers, and it’s the most widely used descriptive statistic. To determine the mean, one simply adds up all the observations in a set of numbers and divides by the number of observations. Teaching students how to calculate the mean is a great way to teach young people how to use a formula and determine a statistic. Students can start by learning that the mean of 10 and 20 is 15 and go from there.
Students will have to calculate the mean over and over again for the rest of their school careers. It is also the most common statistic used outside of school. In just the last week, news articles discuss an eclectic array of averages, from the average amount of revenue a Kura Sushi location brings in, to the average number of sexual partners a resident of Doncaster, England has had in their lifetime. (This was seriously the third result in a Google News search. The average is 12.8 for men and 12.1 for women.) Averages are everywhere.
The problem with averages, however, is that they can give a misleading snapshot of a dataset. As the joke goes, if Bill Gates walks into a bar the average person in the building is now a billionaire. That’s because outliers can have a tremendous effect on an average. Take 10 normal people sitting in a bar with an average wealth of $150,000. If Bill Gates, whose net worth is around $120 billion walks in, the average skyrockets to over $10 billion. So one could say that the average person in that bar is a billionaire and be mathematically correct, even though it’s a deeply misleading statistic.
The same could be said about the viewership of Econsoapbox.com. I could say that my average number of views for a post in June was 634. This average, however, is being significantly raised by my recent post on the Japanese stock market, which was linked on Marginalrevolution.com (big thanks to Tyler Cowen and welcome to new subscribers!) and led to thousands of views. While the average certainly has its uses, it also has some major drawbacks.
The lesser-used alternative to the mean is the median. A median is just the middle observation in a set of numbers. This reduces the influence of outliers. If five people have incomes of $50,000; $80,000; $90,000; $110,000; and $150,000, the median income for that group of people is $90,000. If the wealthiest person wins the lottery and now has an income of $15,000,000 instead of $150,000, the median of the group is still $90,000. Median allows one to find the typical outcome, rather than just dividing the total by the size of the group.
In many cases, the median is superior. For example, consider the mean vs. median income of a US family. In 2021 the mean income for a family was $121,840. If that sounds high, it’s because any income average is going to be heavily influenced by outliers. The least income a family can earn is $0, but the maximum is limitless. Like the Bill Gates in the bar example, the super-rich are going to skew the average. This is especially important in a country like the United States, where income inequality is high relative to comparable countries. But this average doesn’t tell you what the typical family earns. To determine which family earns an amount so that half of American families are poorer and half are wealthier, median is needed. The median family income in the US is $88,590. That is a big difference!
So if median is superior to mean, why is the mean used more? There are a few possible causes. First, there isn’t a simple way to calculate a median, so the application in a school setting is limited. Means are a great way to teach students a simple formula while finding the median just means lining up a group of numbers from smallest to largest and determining which is in the exact middle. Secondly, I don’t think a lot of middle school teachers understand the difference. I remember asking my math teacher why we had to determine both median and mean and was told, “It’s just a different way of finding the middle.”
Is the mean ever better than the median? Absolutely. If observations are being summed to form a total, then mean is generally better. For example, when determining annual revenue for all Kura Sushi locations, the average needs to be multiplied by the number of locations, not the median. The reasoning is straightforward; if the median revenue for a location is $3 million, you might think that 100 locations would have a combined revenue of $300 million. But that would not take into account high outliers, so the true total revenue of 100 locations would be larger than $300 million.
In some cases, neither median nor mean is going to do the job. Say you want to know the typical number of cavities that a patient has when visiting the dentist. The median number of cavities per patient is probably zero. But all that means is that more than 50 percent of patients don’t have any cavities. The mean number of cavities per patient might be around 0.20, because while most patients have none there will be some patients that have 2+ cavities when they visit, bringing up the average. Both the median and mean don’t tell you much because of all the patients with no cavities. Instead of the mean or median, the most useful statistic here would likely be the percentage of patients a dentist sees that have a cavity.
In sum (ha ha), when trying to find the typical amount of something, use the median. When summing or aggregating, use the mean. When there are a lot of zeros, find the percent of non-zero observations. Better yet, instead of relying on a single statistic, look at several.