Bitstring/bitmask encodings

Bitstring (or 'bitmask') encodings provide a compact means of storing flag values.

For example, suppose we want to record the lunch preferences of a group of information analysts. You might imagine a table structure in which there is one CHAR(1) column per foodstuff, so that you can add a row for each analyst and flag each food that the analyst likes:

Analyst Bread Jam Cheese Chocolate Doughnuts
Bob Y N N Y Y
Jim Y Y Y N Y
Sue Y N N N Y

This is a fairly straightforward but expensive way to store the data – internally to SQL Server, each flag field requires a byte of storage for the Y/N value.

The Y/N flags above could just as well be described as 1s and 0s, and a string of bits concatenated together (e.g. 10001 for Sue's choices) coincides with the way that integer values are represented internally. Reading (by convention) from right to left, each bit (0 or 1) represents a successive power of 2: 1, 2, 4, 8, 16, 32, 64, 128 and so on. You can calculate the integer value represented by a bitstring by adding up the values of the positions occupied by a 11); e.g.:

  • The bitstring 00000001 means the number 1
  • The bitstring 00000101 means the number 5 ( = 4 + 1)
  • The bitstring 10110110 means the number 182 ( = 128 + 32 + 16 + 4 + 2 )

Using this approach, we can store analysts' preferences using a single TINYINT field holding the integer value encoded by the required bitstring. A TINYINT occupies a single byte = eight bits (which is why the bitstrings below have been padded to length eight with leading zeroes), so is pretty efficient!

Analyst Bitstring FoodPreferenceValue
Bob 00010011 19
Jim 00011101 29
Sue 00010001 17

This is the way that database statuses are encoded in SQL Server's system table sys.sysdatabases (now deprecated – it was supplanted by the sys.databases DMV in SQL 2005).

The next question is, given an integer value, how can I find out which bits are set (i.e. have the value 1)? The answer is to use the 'bitwise AND' operator, denoted '&'.

The bitwise AND operator takes two values and returns a value that corresponds to the Boolean ANDing of bits in corresponding positions in the inputs. For example:

  • 00011011 & 00010101 = 00010001

    because (reading the two inputs from right to left):

    1. 1 (the first bit of the first input) & 1 (the first bit of the second input) = 1 (the first bit of the output)
    2. 1 & 0 = 0
    3. 0 & 1 = 0
    4. 1 & 0 = 0
    5. 1 & 1 = 1
    6. 0 & 0 = 0
    7. 0 & 0 = 0
    8. 0 & 0 = 0

So if you want to know if a bit is set in a given input value, you can find out by bitwise ANDing it with another value in which only that bit is set. e.g Is the fourth bit set in the value 91?

  1. Find the bitstring in which only the fourth bit (R→L) is set: 00001000. This is called a 'mask'.
  2. Calculate the integer value that this represents: 8
  3. Take the bitwise AND of the input value and the mask. If the mask's bit is set in the input value, then the result will be the value of the mask, i.e.:
    • if 91 & 8 = 8, then the fourth bit is set in 91
    • if 91 & 8 <> 8, then the fourth bit is not set in 91

In fact, the fourth bit is set in 91, as you can see from its bitstring: 01011011. Taking the bitwise AND of this string with 00001000 (8) produces 00001000 (8) - which is the effect we expect.

  1. Does Sue like doughnuts?

    • 'Doughnuts' are represented by 00000001
    • This has the value 1
    • Sue's food preferences value is 17
    • 17 & 1 = 1

    → Sue likes doughnuts

  2. Does Bob like jam?

    • 'Jam' is represented by 00001000
    • This has the value 8
    • Bob's food preferences value is 19
    • 19 & 8 = 0

    → Bob doesn't like jam

  3. Does Jim like cheese and chocolate

    • 'Cheese' is bit 3
    • 'Chocolate' is bit 2
    • 'Cheese and chocolate' is represented by 00000110
    • This has the value 6
    • Jim's food preferences value is 29
    • 29 & 6 is 4 (not 6!)

    → Jim doesn't like cheese and chocolate.

    (Note that here we mean that Jim doesn't like both cheese and chocolate – the fact that Jim likes cheese is necessary for him to like both, but it isn't sufficient).

To find analysts who like chocolate:

SELECT * 
FROM AnalystFoodPreferences
WHERE FoodPreferenceValue & 2 = 2

The fact that this only returns Bob is likely to be a data quality issue.


1)
Bits with this value are said to be 'set'.