Structs

This page was translated by a robot.

When several variables serve a certain purpose together, they form a unit of information. A unit of information can be formed in C using the structtype , which combines several different types of variables and makes them accessible to the programmer using symbols. In colloquial usage, this construct is called struct . Within a struct, the clustered variables are called arrays.

In C++, structs have been extended to classes. In addition to the fields, a class stores associated functions, the so-called methods . When classes are nested within each other, methods are inherited from parent to child classes . Subordinate classes can overwrite so-called virtual methods and assign their own implementation. So that the correct methods can be called for instances with virtual methods during the runtime of the program, each such instance needs a so-called vtable (virtuals table), which stores which methods belong to the dynamic type.

Details

Basically, the following applies: A struct and a class combine several different types of variables. A variable defined with a struct- or class-type thus points to a block of memory where multiple values ​​are packed. When addressing a field, the compiler adds an offset to the start address of the block and thus determines the memory address of the field being searched for.

However, how exactly the fields are actually arranged in memory is a matter of some uncertainty (at least for the author) and is probably at least partly dependent on the system and compiler used.

To the knowledge of the author, structs and classes are always viewed as a block that belongs together. So it doesn't matter where a struct is allocated (heap or stack), the arrangement of the fields within the memory block always remains the same. Furthermore, the individual fields (to the knowledge of the author) are always arranged within the memory block in the same order as they were written into the program code by the programmer. This gives the programmer the ability to control the memory layout himself and take advantage of certain idiosyncrasies of variable placement through small tricks. However, such gimmicks are hardly ever made anymore and are also considered dangerous, which is why they are not discussed further here.

The offsets of the individual fields are always aligned by a compiler (according to the author's assumption) . This means that each multi-byte value must begin at an address that is evenly divisible by the number of bytes in the value. A 4-byte value must therefore be at a 4-digit address, for example. As a result, it may be the case that when two types of different sizes are arranged in a row, additional bytes must be inserted between the values ​​in order to guarantee the alignment. This addition of padding bytes is called padding . Padding bytes are to be considered as unused , the content is undefined.

short  a: 2 Bytes
double b: 8 Bytes
char   c: 1 Byte
float  d: 4 Bytes
char   e: 1 Byte
short  f: 2 Bytes
double g: 8 Bytes

|aaaa|aaaa|----|----|----|----|----|----|
|bbbb|bbbb|bbbb|bbbb|bbbb|bbbb|bbbb|bbbb|
|cccc|----|----|----|dddd|dddd|dddd|dddd|
|eeee|----|ffff|ffff|----|----|----|----|
|gggg|gggg|gggg|gggg|gggg|gggg|gggg|gggg|

With a Sparc processor, for example, this alignment is absolutely necessary, otherwise a runtime error occurs. This does not necessarily have to be the case with an Intel processor, but two memory accesses are required for a field in the event of a misalignment . The author suspects that C compilers generally (regardless of which processor) always align fields.

A programmer may be tempted to arrange his variables in such a way that as little padding as possible is necessary. However, the effects are negligible on modern computers and compilers and only lead to an increase in performance in the rarest of cases.

Next Chapter: Logic