If you have a struct like the following, could you predict its size?

/* assuming 64 bit machine */
struct human {
    int age;        /* 4 bytes */
    char *name;     /* 8 bytes */
    char ID;        /* 1 byte  */
};

I have found an answer to this question is a good way to learn about different memory topics such as Byte addressable, endianness, data alignment, and structure padding.

With arrays, it is trivial to know its size and how to access individaul elements. Assuming an integer is 32 bits. An array of integers with 10 elements would have:

(int) 4 * 10 = 40 Bytes.

With structs data type, this question is tricky. If we try to do the same, adding individual elements together, we would have:

(char *) 8 + (char) 1 + (int) 4 = 13 Bytes.

But that’s quite wrong as can be seen from using sizeof(struct human) and the output of Pahole below, which is a utility program that could give you information about your data structres:

struct human {
    int         age;        /*      0       4 */

    /* XXX 4 bytes hole, try to pack */

    char *      name;       /*      8       8 */
    char        ID;         /*      16      1 */

    /* size: 24, cachelines: 1, members: 3 */
    /* sum members: 13, holes: 1, sum holes: 4 */
    /* padding: 7 */
    /* last cacheline: 24 bytes */
};

Data Alignment

Although Memory is arranged sequentially and is byte addressable, data is read from it rather in word sized chuncks minimizing number of bus cycle required to access data.

For example, assuming a 32 bit machine with word size of 4, the layout can be organized as below. This layout shows that, we could access the data in one bus cycle, or more based on the start address of the data.

To access a data type that reside in an odd address for example, would require extra bus cycle.


Bank 0 Bank 1 Bank 2 Bank 3
0x0000 X1 X2 X3 X4
0x0004 - - - Y1
0x0008 Y2 Y3 Y4 -

Based on that, to access data efficiently, data types doesn’t start at arbitrary addresses in memory. Data types has default alignment (a pre determined boundary) that’s not always the same as its size as it may appear. The alignemnt requirement means that the address that the data type has to start on shoud be multiple of the alignment. For example, a long or double with 8 byte alignment requirement should start on an address that is divisible by 8. Chars are exception, they can start on any byte address.

You could chekout type’s alignment using the alignof keyword as follow:

    printf("int alignment is: %lu", __alignof__(int));

Structure Padding

Padding is the process of inserting additional bytes by the compiler to make the data elements aligned. These extra bytes are inserted either before a larger alignment requirement or at the end of the structure, for the total size of the structure should be multiple of the largest alignment of any structre member. This as to ensure that all the members are self-aligned. Thus, is’s possible to change the amount of padding to reduce the memory requirment by reordering the elements.

After organizing the elements in descending order, the size was reduced from 24 bytes to 16 bytes.

struct human {
    char *      name;       /*      0       8 */
    int         age;        /*      12      4 */
    char        ID;         /*      8       1 */

    /* size: 16, cachelines: 1, members: 3 */
    /* padding: 3 */
    /* last cacheline: 16 bytes */
};

Refrences: