TK's Newsletter

Share this post

Structs, Padding, and Memory Layout

onlytk.substack.com

Structs, Padding, and Memory Layout

How are structs represented? And why does a struct sometimes takes more memory than it needs?

TK
Sep 20, 2022
Share this post

Structs, Padding, and Memory Layout

onlytk.substack.com

Structs, or structures, are a tool that programming languages offer us to make our own types. We can make our own composite data type comprising multiple other built-in data types. For example, we can define a type Person, comprising of an integer age value, and a double height value. In C, this would look something like this:

Thanks for reading TK's Newsletter! Subscribe for free to receive new posts and support my work.


struct Person{
     double height;
     int age;
};

Let’s assume that we are on a 64-bit CPU architecture and that a double type takes 8 bytes, and an int type takes 4 bytes.

How many bytes in memory does a variable of type struct Person consume?

If you do the math: height will take 8 bytes, and age will take 4 bytes, 8+4 = 12. It will consume 12 bytes of memory. Although this answer seems reasonable, it’s wrong. A variable of type struct Person will actually take 16 bytes.

Let me explain:

CPUs operate on bits as chunks. They read and write bits in chunks at a time. That chunk of data is called a word, and it has a specified size. On 64-bit CPUs, the word size is 64 bits. On 32-bit CPUs, the word size is 32 bits. The CPUs operate in data with respect to their word size. They manipulate bits word by word. They read/write one word, followed by another, followed by another.

Let’s say we have an integer value of 4 bytes. 2 bytes of that integer are placed in one word, and the other 2 bytes are placed in another word. That integer now crosses 2-word boundaries (to read its value, the CPU must read in the first word, and then the second word). That’s 2 operations just to read the 4-byte integer value. The integer value can fit into one-word size easily? So why make it crossover 2-word sizes?

This operation can be cut in half by placing values in memory in a clever way to disallow values to crossover 2-word boundaries. Each value must be placed in a memory address that is a multiple of its size. For example, a 4-byte integer value must be placed in a memory address that is a multiple of 4: the memory addresses for the integer can be 0, 4, 8, 12, etc. A 2-byte value is placed on a memory address that is a multiple of 2, such as 0, 2, 1002, 2324, etc. You get the pattern. This way, values will not crossover more than 1 word-size, and will only take one operation from the CPU to read/write. Padding will often be introduced to allow values not to crossover more than one word-size. For example, lets take a look at the struct below:

struct Teacher {
     char department;
     int id;
     double salary; 
};

department takes up 1 byte, id takes up 4 bytes, and salary takes up 8 bytes.

The depratment value will take up memory address 0-1, followed by id. The id (a 4-byte value) can’t fall on a memory address that is not a multiple of 4. So what does the compiler do? It adds padding: Extra allocated memory, 3 bytes in this example, between department and id that allow the value of id to fall on a memory address that is a multiple of four (id will now fall on memory address 4-8). The salary value will have no padding issues as it will fall on memory address 8, which is a multiple of 8.

Let’s go back to our struct example to better understand why it consumes 16-bytes of memory. In our Person struct, let’s say the height double falls on memory address 0. So the height value takes up memory addresses 0-8. The 4 byte age value must fall on a memory address that is a multiple of 4, and 8 is a multiple of 4.

The memory layout will look like this. The 8-byte height value will take up memory address from 0-8, and the 4-byte age value will take up memory addresses 8-12. This still adds up to 12, and there doesn’t seem to be any padding issues.

There is one last thing that I haven’t mentioned yet: A struct value’s memory address must be aligned with a memory address that is a multiple of it’s largest field.

All of the values that the Person struct describes take up a total of 12 bytes, and 12 is not a multiple of the structs largest field (8-bytes — the height double). So the compiler adds padding to the bottom of the struct to make sure the struct aligns correctly in memory. In this example, the compiler will add 4-bytes of padding to make the struct occupy 16-bytes of memory, satisfying the multiples of 8 requirement.

The C example explains a lot of the details described above (the sizeof function in C prints out the number of bytes a type consumes in memory)

:

#include <stdio.h>

struct test{
    double test;
    int i;
};

int main()
{
    printf("%lu\n", sizeof(double));
    printf("%lu\n", sizeof(int));
    printf("%lu\n", sizeof(struct test));

    return 0;
}

// Result of the program execution:

8
4
16

Thanks for reading TK's Newsletter! Subscribe for free to receive new posts and support my work.

Share this post

Structs, Padding, and Memory Layout

onlytk.substack.com
Comments
TopNew

No posts

Ready for more?

© 2023 TK
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing