georg's blog

Endianess

For multi-byte variables as int, it matters how this sequence of bytes is stored in the memory. The two possibilities are little endian and big endian.

But first, let's recap, how hexadecimal values are written and interpreted and how their bits are stored. The following variable i is of type integer and uses four bytes of storage:

int i = 0x12345678;

The digits are called nibble, each having a value of 0 to f (representing 15). In this example, the 8 is the least significant digit, its value is factored with

16^{0} = 1

. The next digit from the right, 7, must be multiplied with $16^{1} = 16$ , thus it adds the value $7 \cdot 16 = 112$ . The other nibbles are handled accordingly up to the leftmost (8th) place (value 1), having a factor of

16^{7} = 268 435 456

Each nibble can easily be converted to binary because there are only 16 different values. For example: 8 = 0b1000. When those are written piece by piece, the hex value can be converted to binary:

0x12345678
= 0x    1    2    3    4    5    6    7    8
= 0b 0001 0010 0011 0100 0101 0110 0111 1000

As in decimal numbers, the least significant bit is far to the right and the most significant bit is left. Bitfields are usually displayed with bit 0 (the least significant) to the right and with increasing bit positions to the left.

Little endian

Systems using little endian byte order store the least significant byte at the lowest address.

int i = 0x12345678;

As in decimal numbers, the rightmost digit (8) is the least significant and the leftmost digit (1) is the most significant. Two hexadecimal digits (two nibbles) are stored in one byte:

↑ large adresses	0x1003	12hex	most significant byte
	0x1002	34hex
	0x1001	56hex
↓ small adresses	0x1000	78hex	least significant byte

This becomes twisted, if multiple bytes are displayed in a row. If the bytes in the row are numbered increasing from right to left, the above sequence of digits can be recognized easily:

0x100f	0e	0d	0c	0b	0a	09	08	07	06	05	04	03	02	01	0x1000
00	00	00	00	00	00	00	00	00	00	00	00	12	34	56	78

But if the bytes are numbered (more intuitively) from left to right, the sequence of pairs is reversed:

0x1000	01	02	03	04	05	06	07	08	09	0a	0b	0c	0d	0e	0x100f
78	56	34	12	00	00	00	00	00	00	00	00	00	00	00	00

Note, that only the bytes (pairs of hexadecimal digits) are in a different sequence, each pair for itself remains with its less significant digit on the right side (similar to decimal numbers like 42). Some Hex-Viewers show pairs of bytes (short, words) -- when these are stored in little endian format and displayed from left to right, they show up: 5678 1234.

The first version appears preferable, at least for multi-byte integers. With character strings, this is different:

char str[] = "Hello world";

This is comparable to an array with the letter H in the first element, i.e. the lowest address. Placing this string behind the variable i and showing addresses increasing from right to left:

0x100f	0e	0d	0c	0b	0a	09	08	07	06	05	04	03	02	01	0x1000
'\0'	'd'	'l'	'r'	'o'	'w'	' '	'o'	'l'	'l'	'e'	'H'	12	34	56	78

In the case of strings, increasing addresses from left to right (as one reads english text) is favorable (by twisting the integer, again):

0x1000	01	02	03	04	05	06	07	08	09	0a	0b	0c	0d	0e	0x100f
78	56	34	12	'H'	'e'	'l'	'l'	'o'	' '	'w'	'o'	'r'	'l'	'd'	'\n'

Big endian

In big endian, the least significant byte is stored at the largest address:

↑ large adresses	0x1003	78hex	least significant byte
	0x1002	56hex
	0x1001	34hex
↓ small adresses	0x1000	12hex	most significant byte

In this byte order, addresses are usually displayed increasing from left to right as this allows to read the multi-byte integer as well as the string:

0x1000	.1	.2	.3	.4	.5	.6	.7	.8	.9	.a	.b	.c	.d	.e	0x100f
12	34	56	78	'H'	'e'	'l'	'l'	'o'	' '	'w'	'o'	'r'	'l'	'd'	'\n'

Network byte order

Documents, that are exchanged between systems and especially network transmissions should care for the byte order. In the internet protocols, a network byte order is defined (which is big endian). There are functions to convert network byte order to the host byte order:

#include <netinet/in.h>
unsigned long htonl(unsigned long hostlong)   // host to network, long (32 bit)
unsigned long ntohl(unsigned long netlong)    // network to host, long (32 bit)

Programming

When does a program need to care for endianess?

Of course, when exchanging data with other instances (other programs or the same program running on a different system) either via files or network, the byte order matters. Only if all systems use the same byte order (for example, all are x86 systems), it can be ignored.

The internet protocol (BSD sockets) libraries use network byte order and require the IP address to be converted with htonl().

In internal data structures, the byte order matters if unions or pointers are used to access portions of other variables. As long as only math operations and casts are used, it can be ignored:

union {
    uint32_t u32;
    uint8_t  u8[4];
} demo;

demo.u32 = 0x12345678;
/*
 * using the address, it depends on the byte ordering, what comes out.
 */
printf("lowest address: u8[0] = %hhx \n", demo.u8[0]);
printf("highest address: u8[3] = %hhx \n", demo.u8[3]);
/*
 * using math operations, the least significant byte can be masked or calculated
 * independently from the byte ordering.
 */
printf("least significant byte: %hhx \n", demo.u32 % 256);              // modulo 
printf("least significant byte: %hhx \n", demo.u32 & 0xff);             // bitwise AND
printf("most significant byte: %hhx \n", demo.u32 / (256 * 256 * 256)); // division
printf("most significant byte: %hhx \n", demo.u32 >> 24);               // bit shift

(download this code.)

social