For multi-byte variables as int
, it matters how this sequence of bytes
is stored in the memory.
The two possibilities are little endian and big endian.
But first, let's recap, how hexadecimal values are written and interpreted
and how their bits are stored.
The following variable i
is of type integer
and uses four bytes of storage:
The digits are called nibble, each having a value of 0 to f (representing 15).
In this example, the 8 is the least significant digit, its value is factored with
.
The next digit from the right, 7, must be multiplied with ,
thus it adds the value .
The other nibbles are handled accordingly up to the leftmost (8th) place (value 1), having a factor of
.
Each nibble can easily be converted to binary because there are only 16 different values.
For example: 8 = 0b1000
.
When those are written piece by piece, the hex value can be converted to binary:
0x12345678
= 0x 1 2 3 4 5 6 7 8
= 0b 0001 0010 0011 0100 0101 0110 0111 1000
As in decimal numbers, the least significant bit is far to the right and the most significant bit is left.
Bitfields are usually displayed with bit 0 (the least significant) to the right and
with increasing bit positions to the left.
Little endian
Systems using little endian byte order store the least significant byte at the lowest address.
As in decimal numbers, the rightmost digit (8) is the least significant
and the leftmost digit (1) is the most significant.
Two hexadecimal digits (two nibbles) are stored in one byte:
↑ large adresses | 0x1003 |
12hex |
most significant byte |
| 0x1002 |
34hex |
|
| 0x1001 |
56hex |
|
↓ small adresses | 0x1000 |
78hex |
least significant byte |
This becomes twisted, if multiple bytes are displayed in a row.
If the bytes in the row are numbered increasing from right to left, the above sequence of digits
can be recognized easily:
0x100f |
0e |
0d |
0c |
0b |
0a |
09 |
08 |
07 |
06 |
05 |
04 |
03 |
02 |
01 |
0x1000 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
12 |
34 |
56 |
78 |
But if the bytes are numbered (more intuitively) from left to right,
the sequence of pairs is reversed:
0x1000 |
01 |
02 |
03 |
04 |
05 |
06 |
07 |
08 |
09 |
0a |
0b |
0c |
0d |
0e |
0x100f |
78 |
56 |
34 |
12 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
00 |
Note, that only the bytes (pairs of hexadecimal digits) are in a different sequence,
each pair for itself remains with its less significant digit on the right side
(similar to decimal numbers like 42).
Some Hex-Viewers show pairs of bytes (short
, words) -- when these are
stored in little endian format and displayed from left to right, they show up: 5678 1234
.
The first version appears preferable, at least for multi-byte integers.
With character strings, this is different:
char str[] = "Hello world";
This is comparable to an array with the letter H
in the first element,
i.e. the lowest address.
Placing this string behind the variable i
and showing addresses
increasing from right to left:
0x100f |
0e |
0d |
0c |
0b |
0a |
09 |
08 |
07 |
06 |
05 |
04 |
03 |
02 |
01 |
0x1000 |
'\0' |
'd' |
'l' |
'r' |
'o' |
'w' |
' ' |
'o' |
'l' |
'l' |
'e' |
'H' |
12 |
34 |
56 |
78 |
In the case of strings, increasing addresses from left to right (as one reads english text)
is favorable (by twisting the integer, again):
0x1000 |
01 |
02 |
03 |
04 |
05 |
06 |
07 |
08 |
09 |
0a |
0b |
0c |
0d |
0e |
0x100f |
78 |
56 |
34 |
12 |
'H' |
'e' |
'l' |
'l' |
'o' |
' ' |
'w' |
'o' |
'r' |
'l' |
'd' |
'\n' |
Big endian
In big endian, the least significant byte is stored at the largest address:
↑ large adresses | 0x1003 |
78hex |
least significant byte |
| 0x1002 |
56hex |
|
| 0x1001 |
34hex |
|
↓ small adresses | 0x1000 |
12hex |
most significant byte |
In this byte order, addresses are usually displayed increasing from left
to right as this allows to read the multi-byte integer as well as the string:
0x1000 |
.1 |
.2 |
.3 |
.4 |
.5 |
.6 |
.7 |
.8 |
.9 |
.a |
.b |
.c |
.d |
.e |
0x100f |
12 |
34 |
56 |
78 |
'H' |
'e' |
'l' |
'l' |
'o' |
' ' |
'w' |
'o' |
'r' |
'l' |
'd' |
'\n' |
Network byte order
Documents, that are exchanged between systems and especially network transmissions
should care for the byte order.
In the internet protocols, a network byte order is defined (which is big endian).
There are functions to convert network byte order to the host byte order:
#include <netinet/in.h>
unsigned long htonl(unsigned long hostlong) // host to network, long (32 bit)
unsigned long ntohl(unsigned long netlong) // network to host, long (32 bit)
Programming
When does a program need to care for endianess?
Of course, when exchanging data with other instances (other programs or the same program running
on a different system) either via files or network,
the byte order matters.
Only if all systems use the same byte order (for example, all are x86 systems),
it can be ignored.
The internet protocol (BSD sockets) libraries use network byte order
and require the IP address to be converted with htonl()
.
In internal data structures, the byte order matters if union
s or pointers
are used to access portions of other variables.
As long as only math operations and casts are used,
it can be ignored:
union {
uint32_t u32;
uint8_t u8[4];
} demo;
demo.u32 = 0x12345678;
/*
* using the address, it depends on the byte ordering, what comes out.
*/
printf("lowest address: u8[0] = %hhx \n", demo.u8[0]);
printf("highest address: u8[3] = %hhx \n", demo.u8[3]);
/*
* using math operations, the least significant byte can be masked or calculated
* independently from the byte ordering.
*/
printf("least significant byte: %hhx \n", demo.u32 % 256); // modulo
printf("least significant byte: %hhx \n", demo.u32 & 0xff); // bitwise AND
printf("most significant byte: %hhx \n", demo.u32 / (256 * 256 * 256)); // division
printf("most significant byte: %hhx \n", demo.u32 >> 24); // bit shift
(download this code.)
Further reading