What exactly is the difference between machine learning and statistics?

March 08, 2013

I often get asked about the difference between statistics and machine learning. It is a tricky distinction because some things that were invented for ML (e.g., PAC theory) also get a lot of play in statistics journals, and vice-versa.

To say they’re completely equivalent (which is what I often hear) is probably a bit too strong. I tend to think of ML as mostly a contribution to the general problem of statistical inference, particularly as it is applied computationally. In contrast, statistics as a field is certainly not entirely devoted to inference.

That distinction goes a long way to explaining why they are different fields, but there are a bunch of idiosyncrasies that are worth pointing out specifically because they are illustrative.

One thing ML people should get credit for, is that they are usually transparent about borrowing heavy amounts of material from statistics and CS, and typically do not claim to have invented a new field. No doubt this saves them many angry letters from statisticians. On the flipside, statistics has benefitted greatly from ML as well, and especially (at least in my opinion) from their focus on empiricism. Since it’s actually reasonably common for people of both fields to be confused about what the others do, it’s reasonably important to note that this cross-fertilization does happen.

