BYTEORDER(5) Standards, Environments, and Macros BYTEORDER(5) NNAAMMEE bbyytteeoorrddeerr, eennddiiaann - byte order and endianness DDEESSCCRRIIPPTTIIOONN Integer values which occupy more than 1 byte in memory can be laid out in different ways on different platforms. In particular, there is a major split between those which place the least significant byte of an integer at the lowest address, and those which place the most significant byte there instead. As this difference relates to which end of the integer is found in memory first, the term _e_n_d_i_a_n is used to refer to a particular byte order. A platform is referred to as using a _b_i_g_-_e_n_d_i_a_n byte order when it places the most significant byte at the lowest address, and _l_i_t_t_l_e_-_e_n_d_i_a_n when it places the least significant byte first. Some platforms may also switch between big- and little-endian mode and run code compiled for either. Historically, there have also been some systems that utilized _m_i_d_d_l_e_-_e_n_d_i_a_n byte orders for integers larger than 2 bytes. Such orderings are not in common use today. Endianness is also of particular importance when dealing with values that are being read into memory from an external source. For example, network protocols such as IP conventionally define the fields in a packet as being always stored in big-endian byte order. This means that a little- endian machine will have to perform transformations on these fields in order to process them. EExxaammpplleess To illustrate endianness in memory, let us consider the decimal integer 2864434397. This number fits in 32 bits of storage (4 bytes). On a big-endian system, this integer would be written into memory as the bytes 0xAA, 0xBB, 0xCC, 0xDD, in order from lowest memory address to highest. On a little-endian system, it would be written instead as the bytes 0xDD, 0xCC, 0xBB, 0xAA, in that order. If both the big- and little-endian systems were asked to store this integer at address 0x100, we would see the following in each of their memory: Big-Endian ++------++------++------++------++ || 0xAA || 0xBB || 0xCC || 0xDD || ++------++------++------++------++ ^^ ^^ ^^ ^^ 0x100 0x101 0x102 0x103 vv vv vv vv ++------++------++------++------++ || 0xDD || 0xCC || 0xBB || 0xAA || ++------++------++------++------++ Little-Endian It is particularly important to note that even though the byte order is different between these two machines, the bit ordering within each byte, by convention, is still the same. For example, take the decimal integer 4660, which occupies in 16 bits (2 bytes). On a big-endian system, this would be written into memory as 0x12, then 0x34. On a little-endian system, it would be written as 0x34, then 0x12. Note that this is not at all the same as seeing 0x43 then 0x21 in memory -- only the bytes are re-ordered, not any bits (or nybbles) within them. As before, storing this at address 0x100: Big-Endian ++------++------++ || 0x12 || 0x34 || ++------++------++ ^^ ^^ 0x100 0x101 vv vv ++------++------++ || 0x34 || 0x12 || ++------++------++ Little-Endian This example shows how an eight byte number, 0xBADCAFEDEADBEEF is stored in both big and little-endian: Big-Endian +------+------+------+------+------+------+------+------+ | 0xBA | 0xDC | 0xAF | 0xFE | 0xDE | 0xAD | 0xBE | 0xEF | +------+------+------+------+------+------+------+------+ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ 0x100 0x101 0x102 0x103 0x104 0x105 0x106 0x107 vv vv vv vv vv vv vv vv +------+------+------+------+------+------+------+------+ | 0xEF | 0xBE | 0xAD | 0xDE | 0xFE | 0xAF | 0xDC | 0xBA | +------+------+------+------+------+------+------+------+ Little-Endian The treatment of different endian values would not be complete without discussing _P_D_P_-_e_n_d_i_a_n, which is also known as _m_i_d_d_l_e_-_e_n_d_i_a_n. While the PDP-11 was a 16-bit little-endian system, it laid out 32-bit values in a different way from current little-endian systems. First, it would divide a 32-bit number into two 16-bit numbers. Each 16-bit number would be stored in little-endian; however, the two 16-bit words would be stored with the larger 16-bit word appearing first in memory, followed by the latter. The following image illustrates PDP-endian and compares it against little-endian values. Here, we'll start with the value 0xAABBCCDD and show how the four bytes for it will be laid out, starting at 0x100. PDP-Endian ++------++------++------++------++ || 0xBB || 0xAA || 0xDD || 0xCC || ++------++------++------++------++ ^^ ^^ ^^ ^^ 0x100 0x101 0x102 0x103 vv vv vv vv ++------++------++------++------++ || 0xDD || 0xCC || 0xBB || 0xAA || ++------++------++------++------++ Little-Endian NNeettwwoorrkk BByyttee OOrrddeerr The term 'network byte order' refers to big-endian ordering, and originates from the IEEE. Early disagreements over which byte ordering to use for network traffic prompted RFC1700 to define that all IETF- specified network protocols use big-endian ordering unless noted explicitly otherwise. The Internet protocol family (IP, and thus TCP and UDP etc) particularly adhere to this convention. DDeetteerrmmiinniinngg tthhee SSyysstteemm''ss BByyttee OOrrddeerr The operating system supports both big-endian and little-endian CPUs. To make it easier for programs to determine the endianness of the platform they are being compiled for, functions and macro constants are provided in the system header files. The endianness of the system can be obtained by including the header <_s_y_s_/_t_y_p_e_s_._h> and using the pre-processor macros __LLIITTTTLLEE__EENNDDIIAANN and __BBIIGG__EENNDDIIAANN. See types.h(3HEAD) for more information. Additionally, the header <_e_n_d_i_a_n_._h> defines an alternative means for determining the endianness of the current system. See endian.h(3HEAD) for more information. illumos runs on both big- and little-endian systems. When writing software for which the endianness is important, one must always check the byte order and convert it appropriately. CCoonnvveerrttiinngg BBeettwweeeenn BByyttee OOrrddeerrss The system provides two different sets of functions to convert values between big-endian and little-endian. They are defined in byteorder(3C) and endian(3C). The byteorder(3C) family of functions convert data between the host's native byte order and big- or little-endian. The functions operate on either 16-bit, 32-bit, or 64-bit values. Functions that convert from network byte order to the host's byte order start with the string nnttoohh, while functions which convert from the host's byte order to network byte order, begin with hhttoonn. For example, to convert a 32-bit value, a long, from network byte order to the host's, one would use the function ntohl(3C). These functions have been standardized by POSIX. However, the 64-bit variants, ntohll(3C) and htonll(3C) are not standardized and may not be found on other systems. For more information on these functions, see byteorder(3C). The second family of functions, endian(3C), provide a means to convert between the host's byte order and big-endian and little-endian specifically. While these functions are similar to those in byteorder(3C), they more explicitly cover different data conversions. Like them, these functions operate on either 16-bit, 32-bit, or 64-bit values. When converting from big-endian, to the host's endianness, the functions begin with bbeettoohh. If instead, one is converting data from the host's native endianness to another, then it starts with hhttoobbee. When working with little-endian data, the prefixes lleettoohh and hhttoollee convert little-endian data to the host's endianness and from the host's to little-endian respectively. These functions are not standardized and the header they appear in varies between the BSDs and GNU/Linux. Applications that wish to be portable, should instead use the byteorder(3C) functions. All of these functions in both families simply return their input when the host's native byte order is the same as the desired order. For example, when calling htonl(3C) on a big-endian system the original data is returned with no conversion or modification. SSEEEE AALLSSOO byteorder(3C), endian(3C), endian.h(3HEAD), inet(3HEAD) illumos August 2, 2018 illumos