Compiler, Linker

This page was translated by a robot.

The statements written in the C or C++ languages cannot be directly understood by a computer's processor. Before the statements can be executed, the entire code (the so-called source code ) must be translated into an executable program (the so-called binary ). The languages C and C++ use a compiler for this, which translates all source files individually, which are then assembled by a linker into a complete, executable program that can be processed directly on the computer.

Details

A processor only understands one language, the so-called machine language, which can be different for each processor. The programming language assembler provides a more or less readable implementation of the individual commands for each machine language (which is fundamentally incomprehensible to humans), but each processor type has its own assembler.

Since a programmer does not want to rewrite his program for every new processor type, so-called high- level languages have been developed, which are above all assemblers and automatically convert the program code in various ways into the desired assembler and finally into the machine language. A rough distinction is made between two high-level language types: interpreters and compilers. Another type that has become fashionable in recent years is the virtual machine , which, strictly speaking, is only a combination of the first two. All three types are in use today in different languages.

In an interpreter language, the individual statements are interpreted one after the other during runtime and converted directly. The advantage of this is that the program can be started directly without the user having to change or convert the original program text in any way. The disadvantage, however, is reduced speed and the repeated translation of the program code each time the program is started. Important examples of interpreter languages are BASIC, Perl, PHP, Javascript and basically all script languages.

A compiler language translates the entire program code once and can then be started any number of times without recompiling. The advantage of this is that the program is very fast and also works independently. The disadvantage, however, is that the compilation takes a lot of time and the final result without recompilation only runs on the processor type for which the compilation was made. Important examples are C, C++, Objective-C and Pascal.

Finally, a virtual machine is a hybrid of the above types. Such a machine refers to an artificial processor that does not exist as hardware, but is only programmed in terms of software. The advantage here is that this artificial processor can be programmed on any computer, so that all programs of this virtual machine can be processed directly on any computer . The program codes are converted into the so-called byte code by means of a compilerconverted, which is the direct equivalent of a processor's machine code. When the program is processed, the virtual processor runs in the background and interprets this bytecode bit by bit and converts it into the actual machine language. Important examples of virtual machines are Java and C# (read: C-Sharp).

The languages C and C++ are converted into machine code with compilers. While some die-hard programmers swear by older compilers from earlier years, the following compilers are widely used today: The VisualC compiler from Microsoft's Visual Studio, the GNU C compiler GCC and the LLVM compiler Clang . All three compilers are currently available free of charge. GCC and Clang exist on all common systems and are also open source, whereas the compiler from Microsoft is not open source and is mainly used on Windows in combination with Visual Studio and until recently could only be purchased in one version.

In addition to the compiler, a linker is also required for a complete translation. These days, however, these are directly integrated into the compilers and modern programming environments automatically configure the correct call, so that nowadays people actually only talk about compiling , even if it means compiling and linking .

Although all existing compilers basically have the task of converting source code into an executable program, they differ significantly in the details. When switching from one compiler to the other, errors can be displayed that were not originally recognized. In some cases it is even possible that a program no longer runs properly. In particular, if a program is to be run on different systems, warnings and errors that have never been seen before can occur. The wealth of differences cannot be listed here, but anyone who dares to cross-compile will be in for a few surprises and will learn a lot about the C and C++ languages.

The translation of C and C++

The source code is translated into a binary in roughly three steps: preprocessing, compiling, linking.

It should be noted that a program in C or C++ often consists not only of a single file, but of several. The preprocessing and the actual translation (compiling) is initiated individually for each implementation file (.c .cpp). Header files (.h) are not normally used as a starting point, but are only included in the implementation files during preprocessing. Finally, a single executable program is created by merging (linking) the individual compiled parts together with all required libraries. Here is a short walkthrough of a complete translation of a program:

A file written in C or C++ usually contains information such as comments or spaces that the programmer has inserted into the program text but that the compiler does not need to know about. Such purely supporting information is deleted from the source code by the preprocessor. Furthermore, the programmer can use the preprocessor to integrate other files, define macros and control conditional compilation and error messages. A comprehensive explanation of the possibilities of controlling the preprocessor can be found in the Preprocessor section.

The resulting preprocessed file is then compiled. This translation can be divided into several steps: First, all information about types, variables, functions, classes... etc., as well as the entire programming structure, are collected and stored in an internal structure of the compiler. This phase is called parsing the program code. During this first phase, the code is checked for correct syntax, i.e. correct arrangement of the individual parts.

This is followed by a consistency check, which checks whether all addressed symbols have been defined, whether the types are set correctly, that no protection violations have occurred... etc. If the compiler has not found any errors up to this point, it starts translating the individual statements in Assembly code, piece by piece. If desired, the code can then be optimized, which improves both the space consumption and, in particular, the speed of the final program. At the very end, an assembler takes over the translation into machine code.

The result of the translation of a single file is saved in a so-called object file. Using the linker, all object files are now combined into a complete program. The linker checks for each object file which definitions with external links are still missing. The linker looks for definitions in both the object files that have just been compiled and in the precompiled standard libraries. If the linker does not find the definition in an existing object file, a linker error occurs and the program cannot be completed. However, if all of the information is complete, the linker will produce a single file containing all of the translated parts that have been properly linked together to form an executable program.

It should be noted here that this list can be more or less pronounced depending on the compiler. Furthermore, it should be noted that a mixture of C and C++ object files can lead to linker problems, which can be solved with the externkeyword .

When the program is finally executed, a so-called loader loads the program into memory and initializes the runtime system . This, in turn, then executes the program and ensures that the program is processed safely and successfully, and also, for example, that so-called dynamic libraries are reloaded. However, all this is no longer part of the actual translation of the program, is often system-specific and is therefore not explained further here (for the time being).

Final Remark

The specific flow of compilation rarely plays a role in programming, but can be configured down to the smallest detail. However, specific settings for each part of the translation are usually secondary to the actual programming work. In everyday programming, at most, a programmer will write a few preprocessor statements, iron out a few (many) compiler bugs, and catch a few linker bugs.

Next Chapter: Runtime-System