This Sunday, during the monthly meeting of the Israeli WG21 National Body discussion forum1, we discussed a paper by Yehezkel Bernat. The paper discusses a specific issue with C++ ranges and iterators. It demonstrates it with the following code sample:
The code in the example does the following:
- Create an infinite2 range counting from 0 upwards (
- Filters that to only return numbers smaller than 11 (
- Takes the first 11 numbers - so 0 through 10 (
- Loops over those and prints them.
But there is a problem.
Instead of printing all 11 numbers and terminating,
it prints them, and then keeps running indefinitely3.
This happens because after taking the 11th element, it also tries to increment towards the 12th.
Since the range is not exhausted,
iota keeps running through numbers.
Those numbers keep get filtered out by
filter as they are all
11 or larger, and so we never terminate.
If you come from other languages (like Python or Java), this should surprise you. In those other languages, similar code will work perfectly well.
This happens due to the design of iterators in C++.
*it, to get a value from the iterator;
++it, to increment the iterator.
When iterating, we start with
it pointing to the first element of our sequence.
This means that we first use
*it to get the current value,
and only then use
++it to advance.
That’s why we had an issue in our above example - we read the 11th value, then tried to advance the iterator to an element that will never exist.
Additionally, we can’t skip the increment. If we do, a subsequent read from the iterator will repeat the current value.
The iterator design in C++ is significantly different from other languages. While C++ does read-then-increment, the common design in other languages is increment-then-read.
__next__() to get the next value of an iterator,
StopIteration if none exist.
hasNext() to check if a value exists (and advance to it),
next() to get the value.
In both cases, there’s no reason to advance after we got the value we want. As a result - the issue won’t reproduce.
I don’t know the reason behind C++’s design. When looking at it, it feels like a derivative of the C-style loop,
Where the increment happens after each iteration. To me, this also seems consistent with the design of the range-based for loop in C++. And while it makes sense for iterating over pre-existing data, it feels a bit off to me when iterating calculated (or fetched) data. In those cases, Python and Java’s design feels more appropriate to me.
Another thing that comes to mind is Arno Schödl’s talk Why Iterators Got It All Wrong from CppNorth 2022.
It discusses the design of iterators and how they mix up pointing to elements (
begin() points to the first element)
and borders (
end() point after the last element).
It seems to me that if
begin() were to point before the first element
(and therefore need to be incremented before being dereferenced) the issue would be resolved.
Then again, I’m probably missing something.
I recommend that you watch the talk and see what you think.