Why Compilers Don’t Autocorrect “Obvious” Parse Errors

Reading Time: 7 minutes

Last month, someone on Twitter relayed a conversation with their 8 year old daughter, who is learning Python. The kid wants to know “If the computer knows I’m missing a semicolon here, why won’t it add it itself?”*

The face I imagine the 8 year old making at her code while asking this question.

*It turns out that the Twitter person meant “colon,” not semicolon, but for most programming languages, “semicolon” still exemplifies the point: if the compiler “knows” what the problem is, why doesn’t it just fix it?

I thought I’d try to answer this question in a way that an 8 year old might appreciate. It turns out a lot of people outside that age range also appreciate it, so I decided to make it better and blog about it.

Lemme start here: the computer runs a program to understand YOUR program.

That program is called a compiler. Python also has an interpreter: a program that runs your program. But this error is a compilation error: the compiler caught it while trying to understand your program.

Compilers can be very, very complicated. There are lots of opinions about how to write them. You can even find different compilers for the same language! For example the compiler your computer has for Python is probably the ‘standard’ one. We call it CPython because it is written in C, another programming language.

There is also Jython, written in Java (another programming language), and PyPy, a Python compiler written in Python! For now, we won’t get into how Pypy works. That’s a thing an 8 year old might enjoy looking up on her own.

Anyway, handling compilation errors.

Have you ever heard that phrase about how “Every happy family is the same, but every unhappy family is unhappy in their own way?” What it means is that there’s a lot more ways to get something wrong than there are to get it right.

And the number of people who write Python, globally, is 8.2 million!

Imagine that. Writing a compiler that has to catch the ways 8.2 million people mess up. A colon seems obvious, when it’s just you coding, and when the compiler is right about what’s wrong with your program.

But, imagine if it’s even 99% right when it catches a colon error (remember, it is hard to predict how things will go wrong. 99% is pretty good.)

Let’s pretend 8.2 million people each make one colon error a day.

8200000 * .01 = 82,000.

82,000 times the compiler is wrong, daily. 

How bad is that?

Well, there are three ‘risk amplifiers’ to consider when you are deciding how to deal with things that could go wrong. Each one makes the risk ‘worse.’

1. It’s catastrophic (breaks very important things)
2. It’s likely (it happens a lot)
3. It’s insidious (it could go uncaught) 

So in the case of the Python colon error, we’re talking about this number of 82,000 a day. That’s a made up number, but it illustrates the point that Python is used by enough people that even RARE compilation error mistakes happen pretty often.

They’re likely, in other words. Most of them are not too catastrophic, right? They’re easy to fix, most of the time, and even in the 82,000 cases where the compiler is mistaken about what, exactly, the programmer has done wrong, drawing attention to that area will help the programmer figure it out. 

Now let’s talk about insidiousness. This is the most under-appreciated and, probably for that reason, often the most dangerous of the risk amplifiers. Example: NASA lost the Mars Climate Orbiter in 1999 because some of the hardware assumed English units, some of the software assumed metric units, and no one caught it until the thing was already lost.

Even in cases where errors are rare, we want mechanisms to catch them. The compiler error being wrong about the colon sometimes, even if it’s usually right, is already likely because of the sheer number of people writing colons in Python.

What if the compiler adds a colon when that’s the wrong thing…in a space launch program? There are a couple of programming languages that are notorious for this kind of thing: Most famously, JavaScript, and to some extent Ruby.

These languages will try with all their might to divine something runnable from what you wrote. How kind of them, right? But the thing is, that can make it really, really hard to figure out why your program is not working properly, because it’s still doing something. Just, it’s the wrong thing.

The wat video shows some funny, salient examples of this that have made it one of the most popular and long-lived “joke” conference talks I’ve ever seen.

How do we manage this risk?

So a common, risk-averse approach in compiler design is to surface compilation errors to the programmer, and let the programmer—you—figure out exactly what’s wrong.

Because, as smart as we compiler designers think we are, you, dear programmer, know your program better than we do. We think we know what’s wrong. We’re even pretty sure. But we don’t know, and we don’t assume.

We say:

“We think something is wrong here. We think you want a colon. But we want you to look at it, too. Because you might know something we don’t, and we don’t want to make your program wrong by accident.” 

Here’s, an example of Python getting the colon compilation error wrong:

It says it wants a colon. The actual problem here is that I typo’d “and” to “nd.” But as you can see, there actually IS a colon in the right spot.

Here’s the fixed version.

This’ll happen 81,999 more times today.

If the compiler tried to automatically add a colon, I’d have two colons and the code is even wronger.

Or it might do so over and over, resulting in a never-ending string of colons and a hung compiler! (“Hung” means “never finishes running and we have to shut it down manually”). 

The compiler avoids causing those kinds of things by leaving it up to the programmer what the problem is, and accepting the ‘cost’ that in most cases, the programmer will be like ‘Ahp, yep, missing a comma. Lemme add that.’ 

If you liked this piece, you might also like:

This piece I wrote on what causes insidious bugs

This transcript of a talk I gave on analyzing application risk

This introductory series on compiler design

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.