# Bitstring/bitmask encodings

Bitstring (or 'bitmask') encodings provide a compact means of storing flag values.

## Flags

For example, suppose we want to record the lunch preferences of a group of information analysts. You might imagine a table structure in which there is one `CHAR(1)`

column per foodstuff, so that you can add a row for each analyst and flag each food that the analyst likes:

Analyst | Bread | Jam | Cheese | Chocolate | Doughnuts |
---|---|---|---|---|---|

Bob | Y | N | N | Y | Y |

Jim | Y | Y | Y | N | Y |

Sue | Y | N | N | N | Y |

This is a fairly straightforward but expensive way to store the data – internally to SQL Server, each flag field requires a byte of storage for the Y/N value.

## Bits

The Y/N flags above could just as well be described as `1`

s and `0`

s, and a string of bits concatenated together (e.g. `10001`

for Sue's choices) coincides with the way that integer values are represented internally. Reading (by convention) from right to left, each bit (0 or 1) represents a successive power of 2: 1, 2, 4, 8, 16, 32, 64, 128 and so on. You can calculate the integer value represented by a bitstring by adding up the values of the positions occupied by a `1`

^{1)}; e.g.:

- The bitstring
`00000001`

means the number 1 - The bitstring
`00000101`

means the number 5 ( = 4 + 1) - The bitstring
`10110110`

means the number 182 ( = 128 + 32 + 16 + 4 + 2 )

Using this approach, we can store analysts' preferences using a single `TINYINT`

field holding the integer value encoded by the required bitstring. A `TINYINT`

occupies a single byte = eight bits (which is why the bitstrings below have been padded to length eight with leading zeroes), so is pretty efficient!

Analyst | Bitstring | FoodPreferenceValue |
---|---|---|

Bob | 00010011 | 19 |

Jim | 00011101 | 29 |

Sue | 00010001 | 17 |

This is the way that database statuses are encoded in SQL Server's system table sys.sysdatabases (now deprecated – it was supplanted by the sys.databases DMV in SQL 2005).

## Bitwise AND

The next question is, given an integer value, how can I find out which bits are set (i.e. have the value `1`

)? The answer is to use the 'bitwise AND' operator, denoted '&'.

The bitwise AND operator takes two values and returns a value that corresponds to the Boolean ANDing of bits in corresponding positions in the inputs. For example:

00011011 & 00010101 = 00010001

because (reading the two inputs from

**right to left**):- 1 (the first bit of the first input) & 1 (the first bit of the second input) = 1 (the first bit of the output)
- 1 & 0 = 0
- 0 & 1 = 0
- 1 & 0 = 0
- 1 & 1 = 1
- 0 & 0 = 0
- 0 & 0 = 0
- 0 & 0 = 0

So if you want to know if a bit is set in a given input value, you can find out by bitwise ANDing it with another value in which **only** that bit is set. e.g Is the fourth bit set in the value 91?

- Find the bitstring in which only the fourth bit (R→L) is set:
`00001000`

. This is called a 'mask'. - Calculate the integer value that this represents: 8
- Take the bitwise AND of the input value and the mask. If the mask's bit is set in the input value, then the result will be the value of the mask, i.e.:
- if 91 & 8 = 8, then the fourth bit
**is**set in 91 - if 91 & 8 <> 8, then the fourth bit is
**not**set in 91

In fact, the fourth bit **is** set in 91, as you can see from its bitstring: `01011011`

. Taking the bitwise AND of this string with `00001000`

(8) produces `00001000`

(8) - which is the effect we expect.

### Some examples

Does Sue like doughnuts?

- 'Doughnuts' are represented by
`00000001`

- This has the value 1
- Sue's food preferences value is 17
- 17 & 1 = 1

→ Sue likes doughnuts

Does Bob like jam?

- 'Jam' is represented by
`00001000`

- This has the value 8
- Bob's food preferences value is 19
- 19 & 8 = 0

→ Bob doesn't like jam

Does Jim like cheese

**and**chocolate- 'Cheese' is bit 3
- 'Chocolate' is bit 2
- 'Cheese and chocolate' is represented by
`00000110`

- This has the value 6
- Jim's food preferences value is 29
- 29 & 6 is 4 (
**not**6!)

→ Jim doesn't like cheese and chocolate.

(Note that here we mean that Jim doesn't like

**both**cheese**and**chocolate – the fact that Jim likes cheese is necessary for him to like both, but it isn't sufficient).

### In SQL

To find analysts who like chocolate:

SELECT * FROM AnalystFoodPreferences WHERE FoodPreferenceValue & 2 = 2

The fact that this only returns Bob is likely to be a data quality issue.

^{1)}