I originally posted this as a series of tweets. With the state of Twitter, I decided to convert it to a blog post.

So… A floating point question.

How many items are expected in a set (Python’s set, C++’s std::set, Go’s map[float64]bool, etc.) when I fill it with NaN values?

This seems to work differently in different languages.

C++

If we run the following C++ code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#include <set>
#include <cmath>

int main() {
    std::set<float> floatSet;
    
    for (auto i = 0; i < 10; ++i) {
        floatSet.insert(NAN);
    }

    return floatSet.size();  // 1
}

We get a single value in the set. This is a bit unexpected, as NAN is not equal to itself.

If we try and add more items to the set after the NAN, we’ll also see that the set is effectively broken:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include <set>
#include <cmath>

int main() {
    std::set<float> floatSet;

    floatSet.insert(NAN);
    
    for (auto i = 0; i < 10; ++i) {
        floatSet.insert(i);
    }

    return floatSet.size();  // 1
}

Returns 1 as well. Note that if we first add non-NaN values to the set, it seems to work ok.

Python

In Python code, things behave a little differently.

Filling a set with the same NaN object will give us a set with a single element:

1
2
3
4
5
6
7
floats = set()

nan = float("NaN")
for _ in range(10):
    floats.add(nan)

print(len(floats))  # 1

While filling it with different NaN objects will give us a set with multiple objects:

1
2
3
4
5
6
7
floats = set()

for _ in range(10):
    nan = float("NaN")
    floats.add(nan)

print(len(floats))  # 10

You see, objects in Python sets are only required to be hashable. There is no requirement for implementing equality. So, if 2 values are the same object (spelled a is b, or id(a) == id(b)), they’ll only appear once in a set. If they are not the same - equality (if possible) will be checked.

Totally expected behaviour.

Go

Go code seems to be the only one that behaves as expected:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
package main

import "math"
import "fmt"

func main() {
    floatSet := make(map[float64]bool)
    nan := math.NaN()
    for i := 0; i < 10; i++ {
        floatSet[nan] = true
    }

    fmt.Println(len(floatSet))  // 10
}

We insert “the same” NaN values, and still get 10 values in the set.

NaN Bonus!

Go is now getting a clear function added to clear maps. This is mostly required because there’s no other way to remove NaN keys from a map.

More Words

Floating point numbers are weird. They are weird when they work correctly (thanks Go) and weirder when they don’t.

If you have any choice in the matter, never use them as keys. If you do, well, be prepared to have some interesting bugs.