Bitsets
Track a lotta things is a smaller space
Published at
Bitsets are cool, little data structures that track membership of a set by doing bit flipping. I also think they're a great illustration of how an alternate implementation of a structure can improve memory and/or process usage far more than continuing to fine tune an existing implementation.
Set interface
Let's start with defining our set interface. Sets only track membership so we need methods for inserting, removing and testing membership. We'll also want some set operations which combine two sets in several ways, I picked intersection, difference and union. Finally a method to check if two sets are equal in terms of storing exactly the same keys.
- Intersect: Given Set A and Set B, return Set C consisting of only elements present in A and B
- Difference: Given Set A and Set B, return Set C consisting of elements in A that are not in B
- Union: Given Set A and Set B, return Set C consisting of all elements of A and B
There are more operations that test two sets such as subset and strict subset
testing but these are the only ones we'll be implementing. I've also decided to
some what arbitrarily restrict our inputs to uint
7, but more on this
later.
1 2 3 4 5 6 7 8 9 |
|
We'll ignore that many types could implement this interface and we'll always
cast an incoming Set
to our current type.
Hashset
Go doesn't have a hashset type, but it does expose map
which is backed by a
hash table8. We'll use this as a convenience over implementing our hash table
which isn't what I'm interested in today. The type itself just houses a map and
the first few operations are similar thin repackaging of map behavior.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Interacting with maps is easy so these methods are easy to implement around
one. We don't need to use reflect.DeepEqual
in our Eq(Set) bool
other than
continuing the brevity, we could compare the maps' lengths and if they're the
same iterate through the keys of one and ensure they're all present in the
other.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
I don't find these methods particularly interesting either, we loop through our
current set and construct a new set by checking if a given key is present or
not in the other set. For union we just dump everything into the new set.
map
's easy to use interface is still putting in work here, especially getting
iteration that hands up both the key and value. We could write r[k] =
sentinel
but r[k] = i
is shorter.
Interesting or not, we have a working set that fulfills our interface. Let's run a few benchmarks to get a feel for its performance.
Hash Benchmarks
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
I'm not a benchmarking expert by any means, I didn't ensure the benchmarks were running in an isolated environment with consistent memory and cpu access. Instead these are running inside my editor, along side several browser tabs including a playing video and I'm pretty sure the plex server on this machine is in active use right now. That's how my code usually runs so I'm fine to include that noise in my benchmarks.
All that said, I'm not at all surprised by these results. For reference,
3031971678ns
is roughly 3.03s
which for iterating through 10 million items
twice and throwing them all into a single map sounds about right to me. To say
that the hash set has hot loops I think is an understatement, and branching
certainly doesn't help. More accurate benchmarking would likely introduce some
fuzziness or even more pathological conditions in the sets, but this is good
enough for me.
3.03 seconds to combine twenty million items might as well be infinitely faster than I could do that myself but it does seem kind of slow for what's ultimately a bunch of integer comparisons.
Bitmask
Before getting into a Bitset, it's useful to look at it's little sibling the
bit mask. Under the hood, integers are actually just a clump of bits that are
interpreted as a binary number: uint64(7) == 0b00..111
1. There's
differences in how a uint64
and int64
are interpreted, namely with unsigned
integers we get the full field and with signed integers we get 1 less bit
because the highest bit is reserved for the sign bit, int64(7) ==
0b100..00111
. This also has implications for how large of an absolute number a
signed integer can represent compared to an unsigned integer, that's less
interesting for us but useful to know if you don't mess with unsigned and
signed integers. Also note I've said "integer" because floats are handled much
differently and I won't be covering them here.
The biggest reason we're using uint64
instead of int64
is to avoid negative
numbers which are invalid indicies and result in a compiler error when used as
shift argument.
We can take advantage of this underlying bit field to encode up to 642
individual flags. While we can't actually index into this field like field[54]
= 1
we can effectively do that via shifting. 1 << 54
creates an integer
where only the 54th bit is set. From there we can use other operators to
combine this mask with our existing field.
If we want to set this bit and only the bit, we bitwise OR the two values
together: field | (1 << 54)
. This operator says "take every value that's
present in either bit field" which sets our single bit while leaving the rest
of the field alone. Since 1 << 54
is mostly zeros, we'll take the values of
the original field for those.
We can check if this bit is set by using a bitwise AND operator: field & (1 <<
54)
will return either 1 << 54
if the field also has the 54th bit set or 0
if it doesn't because we're taking only the bits present in both the field and
the mask.
Finally for us we can unset a single bit by combining a bitwise COMPLEMENT and
bitwise AND. First we create our mask 1 << 54
and then we invert the
resulting mask ^(1 << 54)
. This turns our mask with only the 54th bit set to
the opposite, every bit except the 54th is set. Finally we AND these to take
every bit from the original field except the 54th since it is zero in our
mask: field & ^(1 << 54)
.
These three operations are our set, is set and unset operations. If we use two
fields with OR and AND then we also have union
and intersect
. We can also
use the same COMPLEMENT AND combination to represent difference
. But what if
we want to store more than 64 things, what if we want to store 128 things? Even
more than that?
Bitsets
That's where bitsets come into play. By dividing our input value by the number of bits in the field, again 64 in this case, we determine which bucket we need. From there we modulo our input to determine which bit we need to twiddle. Take a moment to convince yourself of this.
With that in mind, let's start looking at the implementation.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The set type isn't very different from the hash set, just the container type is different and we have two helper functions. I prefer to keep these separate even in actual implementation because it helps me keep which value is which. Our set, unset and isset implementations are slightly more complicated than just finding a bucket and interrogating a single bit.
1 2 3 4 5 6 7 8 9 10 11 |
|
Set
handles finding the bucket and setting the correct bit, but it also
handles ensuring the slice is big enough for the largest value. There's lots of
strategies to grow a slice, going to just big enough is easy enough to
implement without adding too much noise to the code we're actually interested
in examining.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Unset
and IsSet
also take advantage of the number of buckets, if our target
is bigger than we can store then we either have nothing to unset or we simply
don't have it in our set. Otherwise we use the techniques described above to
unset or interrogate the single bit.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Since intersection is defined as only the elements common to two sets we can't simply copy one of the existing sets since its could have more buckets than the other. Instead we create a bitset with the least amount of buckets necessary and then since we have at least that many buckets in both sets, we can collect their indicies and use bitwise AND to combine the two sets.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
For union we want every value in both sets so we can copy the set with more
buckets. Aliasing a
and b
to lo
w and hi
gh makes keeping track of things
a little easier than flipping a
and b
around directly. Once we have the
longer buckets copied in, we gather up the values in the second set and OR them
into our values which gives us the complete union.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Difference and intersect are almost the same except we want to begin with every
value in our current set b
rather than least amount of buckets, and our
operation is the AND-COMPLEMENT construct rather than just AND.
Finally, to complete the interface we need to implement Eq(Set) bool
which is
actually non-trivial. We can't just compare the lengths of both underlying
slices because the higher buckets could just be 0 -- a high value was set at
one point but isn't any longer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Bitset Benchmarks
Okay, we've done the implementation let's look at the benchmarks. Again, I've taken no particular care to ensure these ran in isolation, etc.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
These are massive improvements. Even the fastest hash set results were at least an order of magnitude larger than their bitset counterparts. It's probably unfair to compare the 10 million tests since the hash set benchmarks only ran once each, but:
- Intersection: 5800 times faster, 310 times less memory
- Difference: 2425 times faster, 155 times less memory
- Union: 6346 times faster, 310 times less memory
It's possible to squeeze even more performance out of bitsets with techniques like loop unrolling and "single instruction, multiple data" or SIMD which would allow us to compare multiple buckets at the same time9.
The allocations are also far far far less, but why it isn't twice for each benchmark I'm not sure since every benched method creates both a new bitset and a new underlying slice for it.
However, I do want to also say that map[K]V
is a general purpose collection
and it needs to several general purpose needs. We could make a specialized
implementation of a hash table that was well suited for unsigned integer keys.
Effectively that's exactly what we've done; our hashing method is just
division and modulo rather than another technique.
Bonus
There's a few more interesting methods I think worth exploring with bitsets.
That's a Len()
method that calculates from the current state of the bitset
and iterating through the bitset. If you were as unfamiliar with bit twiddling
as I was when I started implementing this change in my ECS you might be
you might be thinking both operations grow linearly with each bucket added.
Maybe something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The benefit of this approach is its simple, requires little explanation and we can easily unroll the inner loop to hard code the 64 comparisons if we wanted. Some creativity could avoid the branch as well. Regardless we'll always check all 64 bits of each bucket, which is fine for very dense bitsets you won't be losing much between that and other solutions. However, if we have primarily sparse bitsets then we might want to explore other options. I found myself in the primarily sparse bitsets situation.
First let's look at one that uses an interesting property of binary numbers:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
What the heck does n & (n-1)
do? You may have encountered it as a way to
check if a number if a power of 2 but that's just one application of this
characteristic. Subtracting 1 from a binary number causes all the bits
following the least significant bit to flip. Consider a bitset of {0, 8, 11}
,
that is 1st, 9th and 12th bits set, and how that would be processed:3
1 2 3 4 5 6 7 8 |
|
Instead of checking against all 64 bits we just counted how iterations it takes to reduce the bucket to 0. We can actually do even better with some complicated bit flipping but the Go standard library already did that for us and put it behind bits.OnesCount644 so our length count can actually just be:
1 2 3 4 5 6 7 8 9 |
|
We can use the same naive method to locate every member set, instead of
incrementing a count we'd calculate bucketNum * 64 + i
to reconstruct the
original integer toss it into a slice. Instead, we can use find first
set techniques to figure out
what individual bits are set in each bucket, in particular
bits.TrailingZeros64
which will let us do exactly the same algorithm but with a different i
: k *
64 + TrailingZeros64(bucket)
. A trailing zeros operation starts counting at
the first bit and counts every 0 until it reaches the first 1, the least
significant bit5.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
bucket ^= (1 << tz)
looks confusingly like our friend bitwise complement but
it is actually XOR, or eXclusive OR. XOR records false or 0 when inputs are the
same and true or 1 when the inputs are different.
1 2 3 4 5 6 |
|
Let's work through our {0, 8, 11}
with this algorithm:
1 2 3 4 5 6 7 8 |
|
At each step the number of trailing zeros represents the original value we
stored6 and the XOR with (1 << tz)
removes that single bit from the
set, leaving us with more zeroes to count.
Full Code Listing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
-
Eliding some of the spots for space. There's 64 bits here for both signed and unsigned, but most of them are going to be zero in this example. ↩
-
The size of our bit field, a
uint16
would be 16 and auint32
would have 32.uint
is hardware dependent and is probably 64 but it could be a smaller value, or possible larger if you're reading this in the future! ↩ -
I've truncated the higher bits off the 64 to save horizontal space. ↩
-
Currently ordering a copy of Hacker's Delight so I can learn more bit tricks. ↩
-
Here's the general idea:
↩1 2 3 4 5 6
n | tz -------- 111 | 0 110 | 1 100 | 2 1000 | 3
-
Not the result of the left shift, not that I've tricked myself into thinking that before. ↩
-
The set could receive a generic type but I felt it would distract from the core point.
And then we would call1 2 3 4 5 6 7 8 9
type Hasher interface { // should be constant to provide identity Hash() uint } type Set[T Hasher] interface { Set(T) // ... }
T.Hash()
and we're back at our implementations. A more interesting generic parameter is[T ~uint64]
, at least for the purposes I ended up using bitsets for. Without the first thing you end doing isuint64(i)
to place the value into the set.↩1 2 3 4 5 6 7 8
type Id uint64 type Set[T ~uint64] interface { Set(T) } var ids Set[Id] = // ... var i Id = 9 ids.Set(i)
-
As of time of writing at least ↩
-
It's also possible to swap division and modulo, min/max and possibly even some of the branches with more complicated bitwise operations if we were trying to squeeze every, single drop and then some. ↩