Characters

The source code of C is written in simple text files (ASCII Files). Less than 100 characters are enough to fully cover the language. Many characters are used for different elements at the same time. The use of the characters is listed here.

Printable characters

' ' Single quotation marks, also known as apostrophes or single quotes. Always appear in pairs in the source code.

" " Double quotation marks. Often simply called quotes. Always appear in pairs in the source code.

+ Plus sign. Flag of Switzerland and first-aid-kits in computer games.

- Minus sign. Also called hyphen.

* Multiplication sign. Also called asterisk.

/ Division sign. Also called slash.

% Percent sign.

= Equal sign.

() Round brackets. Correctly called parentheses. Often times simply called brackets which confuses them with the SQUARE brackets [] but that is not an issue as in real life, the distinction between the two is clear from the context. Always appear in pairs in the source code. The brackets must be well-formed, which means that there must be a unique closest preceding opening bracket ( to every closing bracket ).

[] Square brackets. Often times simply called brackets which confuses them with the ROUND brackets () but that is not an issue as in real life, the distinction between the two is clear from the context. Always appear in pairs in the source code. The brackets must be well-formed, which means that there must be a unique closest preceding opening bracket [ to every closing bracket ].

{} Curly brackets. Always appear in pairs in the source code. The brackets must be well-formed, which means that there must be a unique closest preceding opening bracket { to every closing bracket }.

< Less-than sign.

> Greater-than sign.

; Semicolon sign.

: Colon sign.

, Comma sign.

. Full stop sign. Also called period, point or most often, simply dot.

! Exclamation mark.

? Question mark.

~ Tilde sign.

^ Caret sign. Also called power.

| Bar sign. Correctly called vertical bar. Also called pipe.

& And sign. Correctly called ampersand.

\ Backslash sign.

# Number sign. Also called hash.

Space, Tab, Enter. Generally called whitespaces. Used in the source code to separate individual elements from one another and to present the code structure more clearly.

_ Underscore sign. The underscore sign is considered an extra letter in C. It is thus possible, for example, to declare a variable with the name _my_variable. Very often, compiler- or architecture-dependent macros are defined with one or two (rarely three) prefixed underscores (e.g. _T_SIZE or __restrict__). Since the underscore character is an almost invisible line, it is in certain code styles used to make multi-word symbols more visually readable.

$ Dollar sign. The dollar sign has no purposeful meaning in C but is treated as an extra letter. It is thus possible, for example, to declare a variable with the name $the$super$variable. However, according to general awareness, this is not used nowadays.

@ At sign. Has no meaning in C and results in syntax errors when used. However, some compilers/linkers of C use the character for name-mangling, more specifically to mark the symbol name with the attribute associated with a type.

` Accent sign. Has no meaning in C and results in syntax errors when used.

Digits and Letters

0-9. The Arabic digits 0 to 9 are mainly used for fixed values ​​(literals), such as integer numbers or floating point numbers. When the first digit of an integer is a zero 0, that number is interpreted as an octal number. Digits can also appear as part of a symbol name as long as they are not the first character in the name. For example, it is possible to declare variables with names vec1, image5x5 or m29513, but not 99bottlesBeer.

a-z A-Z. The Latin lower and upper case letters are mainly used for symbols, e.g. for variable names or function names. However, some keywords are predefined which may not be used as symbols as they represent built-in names of the language. In addition, some single letters have a special meaning depending on the situation, which is listed in more detail below.

0x 0X h. A zero followed by a lowercase or uppercase x is interpreted as the prefix of a hexadecimal number. Some compilers even detect a lower case h at the end of an integer as such. The suffix is not according to standard though and should not be used in portable code.

a-f A-F. The lowercase or uppercase letters a to f are interpreted as the digits 10 to 15 of hexadecimal numbers.

0b 0B b. A zero followed by a lowercase or uppercase b is interpreted by some compilers as the prefix of a binary number. Some compilers even detect a lower case b at the end of an integer as such. Both the suffix as well as the prefix are not according to standard though and should not be used in portable code.

e E. A lowercase or uppercase e is used to introduce the exponent of a floating point number.

p P. A lowercase or uppercase p is used to introduce the exponent-of-2 of a hexadecimal floating point number.

f F. A lowercase or uppercase f is used as a suffix for float-Type literals.

l L. A lowercase or uppercase l is used as a suffix for a large number. It declares either a long int Type or a long double Type.

ll LL. A double lowercase or uppercase ll is used as a suffix for a very large number. It declares a long long int Type.

u U. A lowercase or uppercase u is used as a suffix for a positive integer.

Control characters

The following characters are not intended to be actual characters visible on the screen, but rather encoded control sequences.

\0. The ASCII character number 0 is called the zero-character and is required in C in particular for the termination of strings ans is implicitly appended to the end of string literals. To use the character in C as an ASCII character, it must be written using the escape sequence \0.

\a. The ASCII character number 7 is called alert. In earlier years, a beep could be heard when this character was printed to the console, which is why the character is sometimes called the bell. Nowadays, the character has no special meaning. To use the character in C as an ASCII character, it must be written using the escape sequence \a.

\b. The ASCII character number 8 is called backspace and was - in earlier days - equivalent to pressing the backspace key. Even nowadays, this character can still be used to detect such a key-press in certain consoles. To use the character in C as an ASCII character, it must be written using the escape sequence \b.

\t. The ASCII character number 9 is called tab and corresponds to pressing the Tab key. When outputting to consoles, this character can be useful for string formatting. The appearance of the tab character in the source code is interpreted as whitespace. To use the character in C as an ASCII character, it must be written using the escape sequence \t.

\n. The ASCII character number 10 (Hexadecimal 0x0a) is called newline and corresponds to pressing the Enter key (depending on the system: Unix: \n Windows: \r\n Mac prior to macOSX: \r). This can be useful for string formatting, but also for keyboard input. The appearance of the newline character in the source code is treated as whitespace. To use the character in C as an ASCII character, it must be written using the escape sequence \n.

\v. The ASCII character number 11 (Hexadecimal 0x0b) is called vertical feed and was used in earlier days to control line devices (e.g. line printers). Nowadays, the character has no special meaning. To use the character in C as an ASCII character, it must be written using the escape sequence \v.

\f. The ASCII character number 12 (Hexadecimal 0x0c) is called form feed and was used in earlier days to control line devices (e.g. line printers). Nowadays, the character has no special meaning. To use the character in C as an ASCII character, it must be written using the escape sequence \f.

\r. The ASCII character number 13 (Hexadecimal 0x0d) is called carriage return and corresponds to pressing the Enter key (depending on the system: Unix: \n Windows: \r\n Mac prior to macOSX: \r). This can be useful for string formatting, but also for keyboard input. The appearance of the return character in the source code is treated as whitespace. To use the character in C as an ASCII character, it must be written using the escape sequence \r.

Trigraphs, Digraphs and <iso646.h>

In earlier times, it was not guaranteed that all the characters listed above could be recognized by the compilers of the time (for C as well as for other languages). While today's compilers usually recognize 7-bit ASCII or even higher encodings without problems, the C language was originally based on a greatly reduced character set: The so-called Invariant Code Set (ISO/IEC 646). Those characters above which were not comprised in this standard had to be encoded with a two- or three-character replacement, called Digraph or Trigraph. Also, there exists (till today) a Standard-Header iso646.h with macro definitions to better support older compilers and to make it easier to switch from different languages.

All of these replacements became obsolete in the course of standardization and today they are only a relic from earlier times and may not be supported anymore in modern compilers. However, they are listed below for the sake of completeness.

Trigraphs denoted one-to-one substitutions of specific characters throughout the code. The occurrence of a trigraph (whether within a string or not) had EXACTLY the same meaning as the occurrence of the corresponding character.

Digraphs denote replacements of certain characters that occur in the source code. If the digraphs appear as part of a string enclosed in double quotes ""or as part of a multi-byte character enclosed in single quotes '', they are NOT replaced. The occurrence of a digraph in the source code has EXACTLY the same meaning as the occurrence of the corresponding character.

The substitutions in the standard header iso646.h are used for operators. These are normal macros which follow the usual rules of the preprocessor.

+-----------------+
|   Trigraphen    |
+---------+-------+
|   ??(   |   [   |
|   ??)   |   ]   |
|   ??<   |   {   |
|   ??>   |   }   |
|   ??=   |   #   |
|   ??!   |   |   |
|   ??'   |   ^   |
|   ??-   |   ~   |
|   ??/   |   \   |
+---------+-------+
+----------------+
|   Digraphen    |
+--------+-------+
|   <:   |   [   |
|   :>   |   ]   |
|   <%   |   {   |
|   %>   |   }   |
|   %:   |   #   |
+--------+-------+
+---------------------+
|     <iso646.h>      |
+------------+--------+
|   and      |   &&   |
|   and_eq   |   &=   |
|   bitand   |   &    |
|   bitor    |   |    |
|   compl    |   ~    |
|   not      |   !    |
|   not_eq   |   !=   |
|   or       |   ||   |
|   or_eq    |   |=   |
|   xor      |   ^    |
|   xor_eq   |   ^    |
+------------+--------+

The use of such substitutions nowadays is discouraged.

Next Chapter: Fixed Values, Literals