Often, we write code to complete a task. Other times, we write code to learn something new. But there are also those rare occasions where our goals are simpler. When we have a dumb idea stuck in our head, and the only way to get it out is to code it. Or when we just want to see, to prove to ourselves, that we can write a certain piece of code.
This post is about one of those wonderful occasions. Specifically - I had to see whether I can implement C++ Scoped Enumerations in Python. And, since you’re reading this post, it’s fair to assume that I succeeded.
If you’re not a C++ programmer (good for you!) you may be wondering - what are scoped enumerations?
Well, first and foremost - they are enumerations. A nice syntactic sugar to define a group of named, constant integer values. They look like this:
Secondly, they are scoped.
This might seem obvious to anyone not coming from C (or older C++),
but this means that the names only exist under the enumeration that defines them.
You can write
Colors::Red to get the
Red value, but just writing
Red somewhere won’t get it.
Yes, this seems obvious, but it wasn’t the case until C++11.
That’s about it.
Of course, we can talk about the fact that they are a form of a strong typdef, and disallow implicit conversions, and more…
But we’re to
bury implement enums not to praise them.
Why implement this in Python, you might ask.
After all, we have
enum.Enum, and that’s plenty good:
And, well, you’re right. There’s no reason to do it. It’s a bad idea. But… It’s plenty fun to try.
So in this post we’ll make sure we can write the following example, and then go even further.
This is our first goal.
We want to make sure that
Colors.Red would be
This might look like an impasse, if we try to access
Red before defining it, we get a
But Python is a very versatile language, and while it does its best to make well-behaved code act reasonably, it also allows for some very fun behaviour when you abuse its mechanisms.
Among those fun-to-abuse mechanisms, we have metaclasses.
By implementing the
__prepare__ method in our metaclass (read about it in the Python docs and on Brett Cannon’s blog), we can specify the namespace object to use when constructing the actual class (instead of the default
This means that during class construction, every name lookup will go through our object.
By creating our own implementation of
__getitem__, we can define any new variable we encounter.
Thus avoiding the
NameError, and defining all the members we wanted.
The next step would be allowing a bit more control for the programmer. You see, some C++ programmers want to set explicit values in their enumerations:
And if C++ can do it, so should Python:
To do this, we need to extend our
You see, if we run it now, we get:
That’s because we forgot to update our
last_value member on assignment.
To do that, we’ll add some code into
__setitem__ as well:
The next natural step in our progression is to allow assigning various constants into our enumeration. This includes global values, as well as members previously defined in the same enum.
Or in Python:
If we try to print the values now, we get a weird result:
There are two problems here. Let’s understand them together, and see what they mean for our next steps.
For one thing,
None instead of the value of
For another, we have
SOME_OS_ERROR as a member variable!
How did that happen?
If we look at our enum class definition, we can classify name usages as reads and writes:
For every read, we call
__getitem__, and for every
__write__ we call
So, step by step:
- We try to read
__getitem__and defining it
- We do the same for
- We call
"SOME_OS_ERROR", defining the member
- We assign
None(the value we return from
With that, we know what went wrong. But fixing it is going to be a bit tricky.
So far, we treated every read, or
__getitem__ call, as a member definition.
As soon as we allow assigning from existing variables, however, this falls apart.
We need to find a new logic to it, a way to differentiate the places where a read means a new member,
from where a read just means “this is the right-hand-side of an assignment”.
Let’s consider some representative cases:
|E1||1 read. Defined.||-||-|
|E2||2 reads. Defined and assigned from.||Assigned to||-|
|E3||Assigned to||-||1 read. Assigned from|
|E4||2 reads. Defined and assigned from.||Assigned to||1 read. Assigned from|
From this, we can deduce some rules. The first is straightforward - if a name is assigned to, we need to create that variable. The second is trickier. We need to count the number of times a variable is read, and the number of times it is used in an assignment. If the number of reads is larger than the number of usages in assignments, we need to define it.
This means that we need to go line-by-line, counting, before we know what names we actually define. This will require post-processing, which we’ll get to later. But first, we must learn to count!
Essentially, there are two different types of name-reads in our enums. One defines a new member, and one is used in an assignment:
To correctly define members, we need to tell those cases apart.
While it’s easy to see the difference, the mechanics of code evaluation make it a bit tricker.
For reads, regardless of their being a member-definition or a use-in-assignment, we get a call to
For every assignment, regardless of it’s inputs, we get a call to
To properly visualize this, we can write a “logging namespace”:
And get the following result:
As we can see - the
__setitem__ calls only give us the name of the target variable.
We may be tempted to say this is enough, but consider the following code:
This is a valid enum definition, and it gives the exact same get-set sequence. We need more information.
The missing piece, as visible in the output, is the variable inputs into assignments.
We need to have the names of all the variables used in an assignment visible to us when we call
The value passed into
__setitem__ is going to be controlled by:
- The values we return from
__getitem__calls in out namespace
- The literals used in the assignments
- The operations used to combine the above.
And constrained by the need to produce the correct result, and not just a list of names.
Among those 3 inputs (
__getitem__ results, literals, operations) we have full control over
With that, we need to somehow preserve both the names provided and the values calculated.
We’re going to do this by cheating.
We’re going to change our
__getitem__ to return a special object - a placeholder object - that holds the name of the variable we read.
We don’t need (and actually can’t) return the “true” value of the named variable, as that requires telling definitions and assignments apart.
In addition to holding the name, our placeholder objects will also implement all the operators we want to allow in our enum definitions.
Once more - we can’t calculate actual values.
Instead, we’ll construct a tree of operations.
A simplistic implementation of a placeholder, allowing only addition, will look something like this:
Note that we are maintaining a list of names, as we want to be able to count occurrences.
Additionally, be mindful of the reverse order in
__radd__, as it is critical for calculating the correct values later.
Literals will always appear in an operation, so always within
Named variables, however, need a “trick”.
So we’ll represent them by assigning the name to the
None as the operator.
With the placeholder mechanism in place, we can now tell which reads go to define new members, and which are assignment uses.
This allows us to translate any sequence of calls to
__getitem__ followed by a call to
__setitem__ (or not, if we reached the end of the enum definition) into a sequence of member definitions and assignments.
A part that I only alluded to before, is that we’re about to have a post-processing step.
Due to the way we differentiate member definitions from assignments, we cannot define our members “as-we-go”.
__setitem__ implementations only collect information, without creating any new variables.
When the class definition is done, our
ScopedEnumMeta.__new__ method will be called, and we’ll get our namespace object as an input.
At that point, we’ll be able to convert the information we collected into a proper enum.
There are 2 types of members we need to handle. The first type is member definitions. They will work just as they did before - take the last assigned value, and increment it by one.
The second type - assignments - are a bit more involved.
An assignment can either be a regular value (if only literals were used), or it can be a
If it is a placeholder, we need to traverse our tree of operations, and calculate a value:
But we’re missing a piece - recall that the operation for named variables is
None, and the stored value is a name, not an actual value.
To get those, we need to do an extra step, and perform a lookup:
As for getting the namespace - we’ll have to use the
inspect module and peek at our parent stack-frames.
But I won’t get into this here.
And with this, we’re done.
Python is a wonderful language. It allows us to ask whether we can, without worrying about whether we should. In this post, I shared the implementation details and some of the thought process of implementing C++ Scoped Enums in Python. I hope you found this entertaining, and maybe learned a thing or two. And I sincerely hope you’ll never use this in any production code.