All whitespace is significant.
It might not always matter to your computer, or compiler, or piece of code. But to you, a human reading the code, it is significant.
I often here people complaining about significant whitespace. They say it makes no sense, that it makes working with the code harder. That whitespace, specifically in code, should not be significant. But the unavoidable truth is that whitespace is always significant, regardless of the language you use.
Before we talk about its significance, we need to define what whitespace is.
We’ll start with the Wikipedia definition of a whitespace character:
[…] any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page.
Next, we’ll divide it into 2 categories - visible and invisible.
Visible whitespace is the whitespace you can see. Indentation, spacing, line breaks… All of them make for visible whitespace.
The words in this line are separated by spaces. This line is indented using 4 spaces. An empty line precedes this line!
Invisible whitespace is all the whitespace you can’t see. This is not because it’s using different characters, but because it is positioned where it does not directly move other characters.
This line ends with 4 spaces. There are 2 line breaks after this line.
Since you cannot see it, it’s hard to make sense of it. This is the source of many complaints.
In addition to visible and invisible whitespace, there’s another category. Indistinguishable whitespace.
This category includes, for the most part, tabs and spaces.
Here we indent with 4 spaces. Here we indent with a tab.
Since they look the same, but are not the same character, they cause many issues.
There are 2 ways for whitespace to be significant. It can be human-significant, meaning that it is significant for the reader; and it can be machine-significant, meaning that it matters to the computer.
Most (if not all) whitespace complaints stem from disparity between those two concepts. From cases where whitespace is human-significant and not machine-significant, or vice versa.
Going forward, we’ll call situations where human- and machine-significance match “matched significance”, and cases where they do not “mismatched significance”.
Let’s look at some examples.
This line has spaces it in. Lines are separated by line breaks. We can have multiple spaces. Or multiple line breaks.
With the exception of invisible whitespace (trailing spaces or line breaks), there is no mismatch.
In Python, whitespace is significant for both the human and the machine in defining scopes. The second line is indented, marking it a part of the function defined on the first line.
Since human readers cannot see the 2 trailing spaces, there’s a mismatch. We expect one output, but the computer gives us another.
This is Apple’s goto-fail bug and it is one of my favourite examples of significant whitespace.
In line 12 there’s a
goto fail statement.
Due to the indentation (whitespace!) it reads (to the human) as if it belongs in the same block as line 11.
But since indentation is insignificant whitespace in C, the computer ignores it.
To be more explicit in C’s syntax, we’ll write it as follows:
Making it clear that it will always
I like this example as it is a significant security issue that was (at least in part) caused by whitespace.
So we know whitespace is significant. Both to humans and to machines. Taking a look at any code-formatters we’ll also see that people like it that way. If I put these 2 different formatting options for a vote, I’m pretty sure which one will win:
After all, none of us really want to count matching braces.
So why does “significant whitespace” get so much hate?
Well, consider the following:
Both of these examples look valid, but they aren’t. To the naked (human-) eye, they are indistinguishable from valid code. But they mismatch spaces and tabs. Two indistinguishable types of whitespace.
This is, as mentioned before, the cause of most of the issues people have with whitespace. They expect it to work, but it doesn’t. In addition to that, there’s no meaningful or straightforward way to detect it when you look at the code. It’s an invisible problem.
I am not going to tell you to stop hating significant whitespace. Whitespace hurt you, and that anger needs to be directed somewhere. I will ask you, though, to point it in the right direction.
Visible, distinguishable whitespace, with matching human- and machine-significance, is a good thing. It helps you make sense of the code, and helps ensure that the computer makes the same kind of sense of it as well.
Invisible, yet machine-significant whitespace is bad. It leads to surprising outcomes and confuses the human writing the text.
Visible, yet machine-insignificant whitespace is also bad. It leads to surprising outcomes and is tricky to detect.
Indistinguishable, yet significant whitespace is the worst. It leads to bugs, errors, and pain. Use whatever tool is available in your toolbox to fight it. Use linters, formatters, and if all else fails - in-editor “visible whitespace” features. Avoid writing systems that allow it. And remember - the problem with Python is not that indentation is significant, it’s that tabs are allowed.