Structs, Padding, and Memory Layout
How are structs represented? And why does a struct sometimes takes more memory than it needs?
Structs, or structures, are a tool that programming languages offer us to make our own types. We can make our own composite data type comprising multiple other built-in data types. For example, we can define a type Person, comprising of an integer age value, and a double height value. In C, this would look something like this:
struct Person{
double height;
int age;
};
Let’s assume that we are on a 64-bit CPU architecture and that a double type takes 8 bytes, and an int type takes 4 bytes.
How many bytes in memory does a variable of type struct Person consume?
If you do the math: height will take 8 bytes, and age will take 4 bytes, 8+4 = 12. It will consume 12 bytes of memory. Although this answer seems reasonable, it’s wrong. A variable of type struct Person will actually take 16 bytes.
Let me explain:
CPUs operate on bits as chunks. They read and write bits in chunks at a time. That chunk of data is called a word, and it has a specified size. On 64-bit CPUs, the word size is 64 bits. On 32-bit CPUs, the word size is 32 bits. The CPUs operate in data with respect to their word size. They manipulate bits word by word. They read/write one word, followed by another, followed by another.
Let’s say we have an integer value of 4 bytes. 2 bytes of that integer are placed in one word, and the other 2 bytes are placed in another word. That integer now crosses 2-word boundaries (to read its value, the CPU must read in the first word, and then the second word). That’s 2 operations just to read the 4-byte integer value. The integer value can fit into one-word size easily? So why make it crossover 2-word sizes?
This operation can be cut in half by placing values in memory in a clever way to disallow values to crossover 2-word boundaries. Each value must be placed in a memory address that is a multiple of its size. For example, a 4-byte integer value must be placed in a memory address that is a multiple of 4: the memory addresses for the integer can be 0, 4, 8, 12, etc. A 2-byte value is placed on a memory address that is a multiple of 2, such as 0, 2, 1002, 2324, etc. You get the pattern. This way, values will not crossover more than 1 word-size, and will only take one operation from the CPU to read/write. Padding will often be introduced to allow values not to crossover more than one word-size. For example, lets take a look at the struct below:
struct Teacher {
char department;
int id;
double salary;
};
department takes up 1 byte, id takes up 4 bytes, and salary takes up 8 bytes.
The depratment value will take up memory address 0-1, followed by id. The id (a 4-byte value) can’t fall on a memory address that is not a multiple of 4. So what does the compiler do? It adds padding: Extra allocated memory, 3 bytes in this example, between department and id that allow the value of id to fall on a memory address that is a multiple of four (id will now fall on memory address 4-8). The salary value will have no padding issues as it will fall on memory address 8, which is a multiple of 8.
Let’s go back to our struct example to better understand why it consumes 16-bytes of memory. In our Person struct, let’s say the height double falls on memory address 0. So the height value takes up memory addresses 0-8. The 4 byte age value must fall on a memory address that is a multiple of 4, and 8 is a multiple of 4.
The memory layout will look like this. The 8-byte height value will take up memory address from 0-8, and the 4-byte age value will take up memory addresses 8-12. This still adds up to 12, and there doesn’t seem to be any padding issues.
There is one last thing that I haven’t mentioned yet: A struct value’s memory address must be aligned with a memory address that is a multiple of it’s largest field.
All of the values that the Person struct describes take up a total of 12 bytes, and 12 is not a multiple of the structs largest field (8-bytes — the height double). So the compiler adds padding to the bottom of the struct to make sure the struct aligns correctly in memory. In this example, the compiler will add 4-bytes of padding to make the struct occupy 16-bytes of memory, satisfying the multiples of 8 requirement.
The C example explains a lot of the details described above (the sizeof function in C prints out the number of bytes a type consumes in memory)
:
#include <stdio.h>
struct test{
double test;
int i;
};
int main()
{
printf("%lu\n", sizeof(double));
printf("%lu\n", sizeof(int));
printf("%lu\n", sizeof(struct test));
return 0;
}
// Result of the program execution:
8
4
16