Reading binary data with Node.js

This article will discuss some of the basics required to work with binary data in Node.js. It will provide some of the basics required to work with Bitcoin at a low level using Node.js.

Basics

The first thing we'll get out of the way are the basics. We will do a quick refresher on binary data.

Bits

Bits represent one binary position that is either 0 or 1, off or on respectively.

When bits are concatenated to make up a binary stream, they can represent all sorts of data. The stream of bits may look like 1011. The meaning of those bits is dependent on how the information was encoded. We need to perform the correct decoding operation to extract the meaning.

Bytes

There are 8 bits in 1 byte. The terms 8-bit, byte, or octet are interchangeable. An example of a byte is 0111 0101.

The bits in a byte are usually read from right from left where each digit represents a power of two that is on (1) or off (0).

More concretely a byte representing unsigned integer values will look like:

Index:   7  6  5  4  3  2  1  0  
Value: 128 64 32 16  8  4  2  1  

With our example above, the value of 0111 0101 can be broken down as:

2 Pow:   7  6  5  4  3  2  1  0  
Dec:   128 64 32 16  8  4  2  1

Bits:    0  1  1  1  0  1  0  1  
On-bits: 0 64 32 16  0  4  0  1  

The sum of the on-bits would make 0111 0101 and have an unsigned-integer value of 117.

If you turn on all bits in this byte, you can end up with 128+64+32+16+8+4+2+1 = 255. Therefore, one byte can represent an unsigned integer from 0-255.

Armed with this knowledge, we will simplify the binary data by representing it as bytes. We can represent each byte with the numbers 0-255 as a proxy for the various bitwise values that can be turned on. In this case, as shown above, 8 0's would be represented by the number 0 and 8 1's would be represented by the number 255.

We will use the 0-255 representation of a single byte from here on to make the conceptual model of reading binary data simpler.

Now that we have an understanding of small numbers, we will discuss how they can be used to build larger numbers.

Endianness

In most circumstances, you will need numbers larger than 255. We need a way to combine small numbers to represent a larger number. Fortunately, there are already constructs to do that.

Intuitively, if we extend the example above by adding another byte to the left, we can increase the size of our number from 2^8 (256) to 2^16 (65536). If we double that, we will have 4-bytes that can hold 2^32 (4294967296) which is the common 32-bit unsigned integer. Pretty neat!

Conceptually, larger numbers can be constructed by appending additional bytes to grow the number space. The additional bytes can be added to the left side or the right side.

You will often hear the terms big-endian and little-endian. These terms refer to whether the addition of bytes happens on the left side (big-endian) or the right-side (little-endian).

Big-endian

Big-endian is straight forward in that we prepend bytes to grow the number, much in the same way 1000 vs 100 where the most significant (largest number) is added to the left side.

Let's start with our number 117 that we had above. If we prepend another byte, we make a larger big-endian number. Recall that with 1-byte, we could only represent 0-255. But with two bytes we can now represent 0-65535.

Bytes: [0, 117]  

If we set the value to 18 for the first bit, we end up with a 16-bit, 2-byte representation that is:

Bytes:  [18, 117]  

If we were to make a Node.js Buffer and read it (more on this later), we end up with the value 4725:

let a1 = Buffer.from([18,117]);  
console.log(a1.readUInt16BE(0));  // logs 4725  

Little-endian

Little-endian is conceptually the same, but instead of prepending bytes we append the additional bytes.

Bytes:  [117, 18]  

If we were to make a Node.js Buffer and read it (more on this later), we end up with the value 4725:

let a2 = Buffer.from([117,18]);  
console.log(a2.readUInt16LE(0)); // logs 4725  

Hexidecimal

That last piece we should understand is hexidecimal. Hexidecimal or hex is a number system that is base-16. It is commonly used to output and display binary data in human readable form. Values in hex can be 0-9 to represent values 0-9 and A-F to represent 10-15. Therefore a single hex value will support 16 numbers from 0-F.

To represent 8-bits or 1-byte, we can use use two hex values. For instance 01, represents the number 1. A5 represents 165. FF represents 255.

If we combine this with what we went over with endianness, we can represent the uint16 (2-byte) number 4725 as follows:

Big-endian:    0x1275  
Little-endian: 0x7512  

Basics Wrap up

Feel free to skip this brief section. If you're still a little confused, you can study the tables below to see how 2's powers, bits, byte values, and hex fit together for big and little endian numbers.

Big-endian  
Pow 2: 15 14 13 12 11 10  9  8   7  6  5  4  3  2  1  0  
Bits:   0  0  0  1  0  0  1  0   0  1  1  1  0  1  0  1  
Byte:                       18                      117  
Hex:                      0x12                     0x75  
Little-endian  
Pow 2:  7  6  5  4  3  2  1  0   14 13 12 11 10  9  8  
Bits:   0  1  1  1  0  1  0  1    0  0  1  0  0  1  0  
Byte:                      117                     18  
Hex:                      0x75                   0x12  

Node Buffers

Buffers are one of the workhorse of binary data in Node.js. If you're not very familiar with them, I recommend reading the Buffer documentation.

We can allocate an empty buffer using the alloc static method, from an array of uint8s or a hex string, or a from a string.

let buffer1 = Buffer.alloc(24);  
let buffer2 = Buffer.from([0,0,0,1,0,0,0,2]);  
let buffer3 = Buffer.from('0000000100000002', 'hex')  
let buffer3 = Buffer.from('some text', 'utf8');  

When a buffer contains binary data, it contains a fixed-size array of bytes. The meaning of these bytes is dependent on how the data was encoded.

Fortunately, armed with knowledge of what the data represents, you can use the built-in Buffer methods to read big- or little-endian values.

As we mentioned previously, a single byte can be represented by the numbers 0-255. As such, we build arrays of bytes as integers and import them into a buffer to represent the binary data.

The standard operations for reading values from the buffer are:

readUInt8  
readUInt16BE  
readUInt16LE  
readUInt32BE  
readUInt32LE  
readInt8  
readInt16BE  
readInt16LE  
readInt32BE  
readInt32LE  
readDoubleBE  
readDoubleLE  
readFloatBE  
readFloatLE  
slice  

With these operations we can read a single byte readUInt8
or read 16- or 32-bit integer values.

Each of these methods can take an offset to signify which position to start the read. This will allow you to traverse through the buffer.

If there are not enough bytes on the buffer to read the number, an exception will be throw. For example, readUInt32BE/LE requires 4-bytes and if there are only 3-bytes an exception is thrown.

Conspicuously missing are 64-bit integers. This is because JavaScript numbers are based on double-precision floating points. Integers are therefore limited to 2^53 - 1. In order to read these values you will have to use a big number library such as bn.js or bignumber.js.

In addition to the read operations, there are write operations for setting values on buffers.

So lets show the most basic example by reading individual bytes from a buffer. As mentioned before, we will build the buffer via uint8 numbers to represent the 8-bit values.

let b1 = Buffer.from([1, 2, 3, 4, 5, 6, 7, 8]);  
console.log(b1.readUInt8());  // logs 1  
console.log(b1.readUInt8(0)); // logs 1  
console.log(b1.readUInt8(1)); // logs 2  
console.log(b1.readUInt8(7)); // logs 8  
console.log(b1.readUInt8(8)); // throws exception  

As you can see, we can read the individual bytes and use the optional offset to vary the position we are reading from.

A more complex example is reading 16-bit integers. In this example, they are stored in the buffer in little-endian format.

let b2 = Buffer.from([1,0,2,0,3,0,4,0]);  
console.log(b2.readUInt16LE());  // logs 1, read 1,0  
console.log(b2.readUInt16LE(0)); // logs 1, read 1,0  
console.log(b2.readUInt16LE(2)); // logs 2, read 2,0  
console.log(b2.readUInt16LE(6)); // logs 4, read 4,0  

The readUInt16LE method will read two-bytes and convert them into a 16-bit unsigned-integer. The same holds for the big-endiant version:

let b3 = Buffer.from([0,1,0,2,0,3,0,4]);  
console.log(b3.readUInt16BE());  // logs 1, read 0,1  
console.log(b3.readUInt16BE(0)); // logs 1, read 0,1  
console.log(b3.readUInt16BE(2)); // logs 2, read 0,2  
console.log(b3.readUInt16BE(6)); // logs 4, read 0,4  

As you can see, the only difference is that the bytes are swapped between the big and little endian values.

The neat thing is we can read both little-endian and big-endian values from the same buffer:

let b4 = Buffer.from([1,0,0,1,0,2]);  
console.log(b4.readUInt16LE(0)); // logs 1, read 1,0  
console.log(b4.readUInt16BE(2)); // logs 1, read 0,1  
console.log(b4.readUInt16BE(4)); // logs 2, read 0,2  

The first result reads 0x1000 from the buffer at position 0 as a little-endian integer. Then it reads 0x0001 from the buffer starting at position 2. Finally it reads 0x0002 as

We can also directly read bytes by using the slice method:

let b5 = Buffer.from([0,0,1,1])  
console.log(b5.slice());  // logs [0, 0, 1, 1]  
console.log(b5.slice(0)); // logs [0, 0, 1, 1]  
console.log(b5.slice(0, 2)); // logs [0, 0]  
console.log(b5.slice(2, 4)); // logs [1, 1];  

Lastly, we can combine all of the above to read an assortment of data from the Buffer

let b6 = Buffer.from([1, 0, 2, 3, 0, 0, 0, 0, 4, 5, 5, 5]);  
console.log(b6.readUInt8(0)); // logs 1  
console.log(b6.readUInt16BE(1)); // logs 2  
console.log(b6.readUInt16LE(3)); // logs 3  
console.log(b6.readUInt32BE(5)); // logs 4  
console.log(b6.slice(9, 12)); // logs [5, 5, 5]  

With this, we read:

  • the byte at position 0.
  • a big-endian uint16 from positions 1 and 2
  • a little-endian uint16 from positions 3 and 4
  • a big-endian uint32 from position 5, 6, 7, and 8
  • a slice of bytes from position 9, 10, and 11

Hopefully this tutorial has helped you understand how we can use Node buffers to read binary data.

comments powered by Disqus