Important notice
Panels such as this contain summaries, hereafter called “checklists” of the main points of the related segment that are vital for the proper understanding of the whole complex. If anything in these panels remains unclear please reread the preceding paragraph(s). Feel free to comment and point out any mistakes.
What is C?
C is a general-purpose, imperative computer programming language, supporting structured programming, lexical variable scope and recursion, while a static type system prevents many unintended operations. By design, C provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, including operating systems, as well as various application software for computers ranging from supercomputers to embedded systems.
C was originally developed by Dennis Ritchie between 1969 and 1973 at Bell Labs, and used to re-implement the Unix operating system. It has since become one of the most widely used programming languages of all time, with C compilers from various vendors available for the majority of existing computer architectures and operating systems. C has been standardized by the American National Standards Institute (ANSI) since 1989 (see ANSI C) and subsequently by the International Organization for Standardization (ISO).
Source: Wikipedia: C (programming lanugauge) 23rd April, 2016
Explanation
General-purpose languages (GPLs) are designed to be versatile so that they can solve many different types of problems instead of just one type of problem. Specialized languages are designed to only solve one or few different types of problems, e.g. Mathematica is designed to solve math problems.
GPLs often support multiple so-called (programming) paradigms such as a language being imperative and object-oriented. A programming paradigm is a way a language goes about solving problems.
C, for instance, is an imperative language, which means that you as the programmer tell the program specifically HOW to solve a problem just like an emperor.
In contrast to C, there are declarative languages such as Haskell, LISP or SQL, which rather require the programmer to describe WHAT the problem is. It is the compiler and/or the runtime system that takes care of HOW the problems described with a declarative language are solved.
C++ and C# are further developed, object-oriented programming languages and successors of the C programming language. Object-orientation also is a programming paradigm.
Help! My computer does not speak any C.
C, as all mentioned languages, is a high-level language, which means it is more abstract than the low-level languages which are closer to machine language code. One important of numerous advantages of the high-level abstraction is the platform-independence, provided a translation routine, a so-called compiler exists for each platform. These compilers are needed because computers are narrow-minded beasts that only speak but one language: machine language.
The translation process of a program written in C can roughly be summarized as follows. A compiler receives your program and checks whether your code is a valid program. If so it translates it into assembly code, which is then translated into object code. These object files are provided to a linker, which patches them together to create the final executable which consists of machine code.
Checklist: “What is C?”
The ways a programming language goes about solving problems are called programming paradigms.
C is a general-purpose, high-level, imperative language.
Its offspring C++ and C# are general-purpose, high-level, imperative, object-oriented languages.
In contrast to imperative languages, there are declarative languages, such as Haskell, LISP or Prolog.
The translation process of a C program includes validation and translation from C into assembly code into object code into machine code.
The programs taking care of this process are called compiler and linker.
Structure of a C-Program
Minimal program
Introductory example
#include <stdio.h> #include "fancyFunctions.h" int main() { int arg = 5; aFancyFunction(arg); printf("Hello World!"); return 0; }
Observations from the introductory example
With the #include
directive you can include so-called headers that include function declarations and sometimes even their definitions. We will discuss at a later point what exactly declaration and definition mean. For now it is okay to think of #include
as an import directive.
You might have noticed that <stdio.h>
is wrapped in angle brackets, while "fancyFunctions.h"
is wrapped in quotation marks. That is because stdio.h
and others are special so-called libraries with a lot of efficient premade functions that you as a programmer can and should make use of. However, "fancyFunctions.h"
is a custom header created by the user that is in the same directory as the currently opened .c file.
Each and every C program has exactly one main()
function. This is where the program starts, that is why a program without a main function is invalid. Note that C is a case-sensitive language, i.e. main()
is something else than Main()
.
In line 5 the variable arg is defined.
In line 6 the function aFancyFunction()
with the parameter arg is called.
In line 7 the function printf()
with the parameter “Hello World!” is called.
Note, if you have not presumed it already, that printf()
is declared in <stdio.h>
and provided we have a valid C program aFancyFuntion()
is at least declared in fancyFunctions.h
.
In line 8 the function returns 0. Let me preempt that the return value of the main function is also called exit code, where 0 means termination without any trouble. Each function, if is not of the type “void”, has a return value of its specified type, e.g. the main function returns an integer (this is what int stands for).
Checklist: a minimal C program
With the #include
directive i can import external dependencies.
Each C program has exactly one int main()
( or void main()
) function.
C is case sensitive. This means that main()
is a different function than Main()
.
Variables
Introductory example
#include <stdio.h> #define M 5 #define N(a,b) a > b ? 1:0 int globalA = 5; int globalB = 3; int main(){ // a += b <=> a = a + b globalA += 5; int globalB = 2; globalB += 5; double localC = globalB + 0.5; float localD; localD = 3.14; int typecastedE = (int) localD + 0.2; int testM = M; int testAGreaterB = N(globalA, globalB); printf("A:%d B:%d C:%.2f D:%.2f E:%d M:%d N:%d\n", globalA, globalB, localC, localD, typecastedE, testM, testAGreaterB); return 0; }
Program output
Declaration, initialization, data types, global and local variables
To use variables, you first have to declare them and then initialize them with their initial value. This can be a done in a single statement.
Declare means to make the variable name known to the compiler while initializing means assigning an initial value to the variable.
And to stress this out as early as possible: the =
operator is neither about an equation, nor a logical operator. For the latter case there’s the ==
. The =
operator is (if not overloaded) an assignment operator.
action | implementation |
---|---|
declaration | int i; |
initialization | i = 0; |
both | int j = 1; |
multiple declarations/initializations | int k, l = 5, m = l + 1; |
As you surely have noticed, there’s a data type before each variable name. This is to set a data type to the variable in order to clarify how the variable value has to be interpreted and how much disk space the compiler has to reserve. The size of data type is also compiler-dependent and hardware-dependent. That is why we will from now on say “typical size”. If the exact size of an integer on a current system has to be determined you use sizeof(int)
. Below is a table with typical sizes for different data types.
data type | symbol | typical size | value range |
---|---|---|---|
integer | int | 4 bytes | $ -2^{31}\text{ bis } 2^{31}-1 $ |
unsigned integer | unsigned int | 4 bytes | $ 0 \text{ bis } 2^{32}-1 $ |
floating point number | float | 4 bytes | $ \approx \pm 10^{38} $ |
double precision float… | double | 8 bytes | $ \approx \pm 10^{308} $ |
character | char | 1 byte | $ 0 \text{ to } 2^{8}-1 $ |
string | char[i] | i+1 bytes | $ 0 \text{ to } 2^{8}-1 \text{ per character } $ |
Notes:
- Internally the characters of the data type
char
are saved as integers. On most systems, these integers are then converted back to characters using the ASCII-code. As of now (2016) the C-standard does not require that, though. - C does not have a distinct data type for strings. Instead, strings are saved as arrays of chars. To mark the end of the string a binary zero is added at the end of the array. That is why a string of length n requires n+1 bytes of memory. Most of the time strings are accessed through pointers. Both, arrays and pointers will be introduced later.
A variable that is not inside a block, i.e. it is not enclosed by a pair of curly braces {...}
has a global scope. This means that it is known in the whole file. As a consequence, you could make use of that variable in every function in that file. An example of a global variable is in line 7 of the code example.
In contrast to global variables, there also so-called local variables. They are only defined in their current scope (function block or loop). If you tried to use their value out of scope you will cause an error that prevents the program from being compiled. Examples for local variables are a, b and c in line 11 of the example.
By implication, this means that a variable of the same name can be used in different scopes, e.g. functions. Self-evidently they need to be declared and initialized newly there.
You might wonder what happens if one had to variables of the same name, one local and one global variable. What happens is that the local variable covers the global variable in that scope. This means the variable has the value of the local definition. Example: double g
in line 18 covers int g
from line 6.
Typecasting
Every time when two variables of different types are linked through an operator, there will be either an error, a user-defined function will be executed or a type conversion occurs. At this stage, I will exclusively talk about the third possibility. When a variable’s data type is changed to a different one we also call that typecasting. In our example, this will happen in lines 17 and 21.
In line 17 we add an integer to a float. Let’s take a look at the right side first. In the first step, the integer is promoted to a float. We call it a promotion because the floating point number has a bigger value range than the integer. Thus, when adding floats and doubles the float would be promoted to a double. Due to the fact, that I did not manually force the conversion, but it happened automatically this kind of typecasting is called implicit.
Then follows the calculation: float + float = result (which is a float). Since on the right side of the equation, an integer is demanded (first right-hand operand), that float is converted (demoted) into an integer. This process is called explicit typecasting, simply done by putting the data type in parentheses “()” just before the variable name. This demotion is done by simply forgetting all decimal places and NOT by rounding it. Then it is converted back into a float, because that it has to be added to the float. Note that the original decimal places are not restored, because they have been forgotten in the last step.
#define
With the directive #define
we can define constants. In our example, a so-called preprocessor will simply replace all occurrences (in line 3 and 11) of M with 5. It is highly recommendable to use this directive for fixed constants, e.g. for an approximation of the number pi. The benefit of doing this is simply that
you can replace a lot of values in your code, just by changing one number. You might wonder why we should not simply declare a variable float pi
instead.
The reason is that it is more efficient to use the #define
directive since, as mentioned above, it replaces all occurrences of the keyword and does not reserve any random access memory.
You can also use #define
to conveniently create simple macros. The one in line 4 (used in line 24) simply compares a and b and if a is greater than b, the result will be 1, otherwise, it will be 0. The brief notation x?y:z
stands for if(x) return y; else return z;
. You can find more about this in the control structures section.
Note: You might have noticed, that there is no semicolon after the all the directives starting with #. You must never add semicolons after blocks that begin curly braces or hashtags. Structs (or classes in C++) are an exception, but we’ll come to that later.
Checklist: variables
A variable must be declared and initialized before it can be used. After that, each variable has a value and a data type.
Data types of fundamental importances are int, float, double, char and string. However, chars are internally represented by integers and strings are char-arrays terminated with a binary zero.
Typecasting allows you to convert a variable of data type A into a variable of data type B, so long as they are compatible. That means, that a conversion is defined (can be defined/overwritten by the programmer). There is explicit and implicit typecasting.
With the #define
directive one can program simple macros. These replaces the keyword with the defined expression or can act as simple functions.
Local variables cover global variables.
Pointers
Any real world application contains tons of pointers. That is why it is very important to understand this basic concept.
Code example
int main(){ int a = 0; int* b = &a; /* Nur C++: * int& c = a; */ printf("a:%d b:%d *b:%d", a, b, *b); return 0; }
Output 1. execution
Output 2. execution
Memory image
Exercise: Determine&a, a, &b, b und *b
for the given memory image. (How to do that is explained in subsequent paragraphs)
Notice:
A common abbreviation for pointer is ptr. In the source code pointer variables will have the additional prefix or suffix ptr.
What are pointers?
Pointers are variables as well. At their declaration, they are denoted with an additional star behind their data type. An int*
would be called integer pointer. Integer pointers do not contain arbitrary numbers, but the memory addresses of integers.
The assignment int* a = &b;
means, that an integer pointer a is assigned the adress of an integer b.
In this exact context, the & is called the address operator.
So far, so easy. What confuses many is the following part. Say I got a pointer int* b to a. How can I now determine the value of a
? This is where the so-called dereference operator comes into play. Unfortunately in another context is has a different meaning. Yes, it is the asterisk *.
While the asterisk in int* b tells the compiler, that b is a pointer, int c = *b; assigns the dereferenced value of b to c. The latter simply means that the compiler looks up what the value is in the memory cell address given by b and it is only this context, that the asterisk is called dereference operator.
Notice:
$ \text{int* a = b} \Leftrightarrow \text{int *a =b} $
Notice: In C++ the following, so called aliasing is possible: int& x = y
. This means, that the address of x
is identical to the address of y
. As a result, x
is an alias (synonymous) name of y
.
Call by value and call by reference
Code example:
#include <stdio.h> int main(){ /* ptr: common abbreviation for pointer */ int* ptrA; int a = 0; ptrA = &a; printf("Inside of main() before callByValue() ist a:%d \n", a); callByValue(a); printf("Inside of main() after callbyValue() ist a:%d \n", a); printf("\n"); /* entspricht callByReference(&a), da ptrA == &a; */ callByReference(ptrA); printf("Inside of main() after callbyReference() ist a:%d \n", a); return 0; } void callByValue(int x){ printf("Inside of callbyValue() before x++ ist a:%d \n", x); x++; printf("Inside of callbyValue() after x++ ist a:%d \n", x); } void callByReference(int *x){ printf("Inside of callbyReference() before x++ ist a:%d \n", *x); (*x)++; printf("Inside of callbyReference() after x++ ist a:%d \n", *x); }
Output
Inside of callbyValue() before x++ ist a:0
Inside of callbyValue() after x++ ist a:1
Inside of main() after callbyValue() ist a:0
Inside of callbyReference() before x++ ist a:0
Inside of callbyReference() after x++ ist a:1
Inside of main() after callbyReference() ist a:1
Functions are explained in more detail in a later section.
Let’s take a closer look at this example. The variable int a
is initialized with 0 and its value is passed to the function callByValue
. What happens is something like this: int x = a;
. The value of a
is passed to x
which is an independent variable. This passing of a value, but not the variable itself is called “calling by value”. Since our manipulations do not change anything about a
its value stays 0.
This is differnt if we call our functions by reference. What happens here is equivalent to the following code: int* x = &a; (*x)++;
. This can be simplified by substitution into this expression: (*&a++)
. Since the *
and &
operators cancel out each other, simply a++
remains. We can derive that a
is actually incremented. The whole process is called “calling by reference”. If I edit dereferenced x
, which is (*x)
or simply *x
, then I simply directly apply changes to a
.
Function Pointers
Code example
void printNormal(double x){printf("%f\n", x);} void printTwoDecimals(double x){printf("%1.2f\n", x);} int main(){ double e = 2.718281828; void (*funcptr)(double); funcptr = &printNormal; (*funcptr)(e); funcptr = &printTwoDecimals; (*funcptr)(e); return 0; }
Output
2.72
What Are Function Pointers?
This again is expecting that the reader knows the concept of functions, but shall not hinder us to introduce function pointers.
Function pointers are no different from “regular” pointers, except that they, as the name implies, point to functions.
At first, two functions are defined that have the same number of arguments/parameters: one double
. The first function should print all decimals of a double
, while the second should only print the first two decimals.
In line 8 a void-pointer is defined in a slightly different way: with parentheses around *funcptr
. As a result, this function pointer can only point to functions that return nothing (void). Its name is funcptr
. In the second set of parentheses we have to specify the data types of the arguments, in this case, a double
.
The rest of this example should be fairly obvious.
Checklist: Pointers
Pointers are variables that contain addresses of other variables, i.e. they can also contain addresses of other pointers.
Pointers are declared according to this scheme: datatype* ptr
.
The address operator is &
and the dereferencing operator is *
. Please do not confuse the dereferencing operator with the declarative asterisk when declaring pointers.
Pointers can also point to functions. This is especially useful in cases with user interaction or when the function to be called is determined at runtime.
When a value is passed to a function then this is called “calling by value”. On the other hand, if you pass a reference, which is a pointer, then we call this “calling by reference”.
When calling by value, the value of the variable is not changed outside of the function, because the function receives its own copy of it. Contrary, if you call by reference, the function only “shares” (because the memory address is passed to the function) the variable with other parts of the program.
Operators
Checklist: Operators
Denotation | Operator | Example |
bitwise OR | | | $ 4|5 \Leftrightarrow 100_2 | 101_2 \Leftrightarrow 101_2 \Leftrightarrow 5 $ |
bitwise AND | & | $ 4 \& 5 \Leftrightarrow 100_2 \& 101_2 \Leftrightarrow 100_2 \Leftrightarrow 4 $ |
bitwise XOR | ^ | $ 4 \text{^} 5 \Leftrightarrow 100_2 \text{^} 101_2 \Leftrightarrow 001_2 \Leftrightarrow 1 $ |
left shift | << | $ 4<<2 \Leftrightarrow 100_2 << 2_{10} \Leftrightarrow 10000_2 \Leftrightarrow 16 $ |
right shift | >> | $ 16>>2 \Leftrightarrow 10000_2 >> 2_{10} \Leftrightarrow 100_2 \Leftrightarrow 4 $ |
Denotation | Operator | Example |
logical OR | || | $ \text{If int i = 1, j = 2;} \\ \Rightarrow (i<j)||(i>j) \Leftrightarrow 1 $ |
logical AND | && | $ \forall \text{ int i, j } (i<j)\&\&(i>j) \Leftrightarrow 0 $ |
equality | == | $ \text{If int i = 1, j = 2;} \\ \Rightarrow i == j \Leftrightarrow 0 $ |
inequality | != | $ \text{Falls int i = 1, j = 2;} \\ \Rightarrow i \text{ != } j \Leftrightarrow 1 $ |
Denotation | Operator | Example |
Little Slugs | += -= *= /= %= ^= &= |= <<= >>= | $ a += b \Leftrightarrow a = a + b; $ rest analogous. |
Modulo | % | $ 5 \% 2 \Leftrightarrow 1; $ remainder when dividing integers. |
dot operator (introduction in later section: structs) | . | Z.height = 5; sets attribute height of Z to 5. |
arrow operator (introduction in later section: structs) | -> | sptr->height = 5; sptr->height is an abbreviation for (*sptr).height |
Control Structures
if…else if…else
Code Example:
#include <stdio.h> int main(){ int i = 0; printf("Enter an integer!\n"); scanf("%d", &i); if((i%2)==0){ printf("Your number is even!\n"); } else if(i==7){ printf("Hooray, it is the 7!\n"); } else{ printf("Your number is odd!\n"); } /* If there's only one command in {}, you can renounce them */ /* Absolutely equivalent: */ /* printf("Enter an integer!\n"); scanf("%d", &i); if((i%2)==0) printf("Your number is even!\n"); else if(i==7) printf("Hooray, it is the 7!\n"); else printf("Your number is odd!\n"); */ return 0; }
Output, if you the remainder of the division equals 0.
Output, if i is 7.
Output otherwise
The function scanf()
from the <stdio.h>
-library expects user input and saves it in the variable &i
. The modulo operator (%), which returns the remainder of an integer division.
Important: C does not know booleans (true
und false
)! Instead false
is represented by 0 and everything else represents true
. C++ knows booleans.
For Loops
Code Example:
#include <stdio.h> int main(){ /* Beispiel 1 iterative computation fibonacci */ int i, n = 0, curFib=1, nxtFib = 1, tmp; printf("Please enter an integer n.\n"); scanf("%d", &n); for(i=0;i<n-1;i++){ tmp = nxtFib; nxtFib += curFib; curFib = tmp; } printf("The %d-th fibnoacci number is %d.\n", n, curFib); /* Beispiel 2 iterative computation faculty */ int nthFac = 1; printf("Please enter an integer n.\n"); scanf("%d", &n); if(n==0) nthFac = 1; else{ for(i=n;i>1;i--){ nthFac *= i; } } printf("The faculty of %d is %d.\n", n, nthFac); return 0; }
Output for $n = 5$
5
The 5-th Fibonacci number is 5.
Enter an integer n.
5
The faculty of 5 is 120.
So what are the arguments of the for
loop? Let’s take a look at line 12 of the example: for(i=0;i<n-1;i++){...}
contains three statements delimited by semicolons inside the parentheses. The first statement is an assignment. The iterator (the counting variable) is assigned an integer. In this case i=0;
. The second statement is a break condition, in other words, an expression to check. If that expression evaluates to 0, then the loop continues with the next iteration. The last statement defines what happens after a successful iteration of the loop, in this examples, our iterator is simply increased by one.
Note: You can always write for
loops as while
loops and vice versa. Although, there’s no point in converting a for
loop syntax into a while
loop syntax, if the number of iterations is known beforehand. It only makes the code harder to read.
Curious note: You can build a while
loop using a for
loop simply by leaving out the first and last statement: for(;condition;)
. In other words while(1)
(C) or while(true)
(C/C++) is the same as for(;;)
.
If it hasn’t become obvious when to use for
and when to use while
loops in the previous two notes, then here’s the explanation. Since both loops are equivalent, there technically is no reason to prefer one over the other. However for practical reasons, such readability and in order to avoid infinite loops it is always a good idea to use for
loops when it comes to iterating through an object of known dimensions.
Exercise: Convert all for
loops in the given code example into while
loops. You might want to read the next paragraph before though.
While Loops
Code Example
int main() { /* Which is the first Fibonacci number greater than x? */ int i = 1, curFib = 0, nxtFib = 1, x, tmp; printf("Please enter a big integer.\n"); scanf("%d", &x); while(curFib<x){ tmp = nxtFib; nxtFib += curFib; curFib = tmp; i++; } printf("The %d th Fibonnaci number greater than %d\n", i, x); printf("It is %d.\n", curFib); return 0; }
Output for $x = 1000$
1000
The 17th is first Fibonacci number greater than 1000.
It is 1597.
The parentheses after the keyword while
contain the break condition for the loop. Since nothing else belongs in these parentheses I must assure, that inside the body of the while
-block something is done that the break condition evaluates to true eventually. In the given example this is achieved in line 15: i++;
.
while
should always be used (or the do-while
variation) if the number of loop iterations is previously unknown to the programmer and/or dependent on including but not limited to user input, software and hardware environment, the input of i/o devices. In this example, we even want to print the number of iterations needed until the Fibonacci number is finally greater than 1000.
Exercise: Rewrite this example exclusively using for
loops. Should you encounter a problem you might want to read about inifinite loops.
Do-while loops
Codebeispiel
#include<stdio.h> int main(){ /* Example 1 */ int i=10; do{ i++; }while(i<10); printf("i:%d\n", i); /* Example 2 */ i=0; do{ i++; }while(i<10); printf("i:%d\n", i); return 0; }
Output
i:10
How does this output come about? Easily explained: the first execution of the do
-block is independent from the condition given in the parentheses just after the while
keyword. Only after the first iteration this break condition is checked and only if it evaluates to false the loop will be left.
Infinite loops and the break
statement
Code example
int main(){ /* Example 1: "wanted infinite loop" */ /* admittedly this example is slightly artificial since one could simply take q!='q' as break condition...*/ /* This initialization is valid. 0 is the ASCII representation of a character. 65 would be 'A'. */ char q = 0; while(1){ printf("Enter a letter!\n"); scanf(" %c", &q); if(q=='q'){ printf("Yay, a q!\n\n"); break; } printf("Your letter was not a lowercase q!\n"); } /* Example 2: "wanted infinite loop" */ char choice = 'a', solution = 'q', guess = 's'; int prim = 4, modulo, i; while((choice!='z')&&(choice!='b')){ printf("What do you like better? Letters or integers?\n"); printf("For integers press z, for letters b.\n"); scanf(" %c", &choice); if(choice=='z'){ while(1){ printf("Enter a prime!\n"); scanf("%d", &prim); for(i=2;i<prim;i++){ modulo = prim%i; if(modulo==0){ printf("Your number was no prime!\n"); break; } } if(modulo!=0) {printf("Thx!\n");break;}; } } else if(choice=='b'){ printf("Then guess my (lowercase) letter!\n"); while(guess!=solution){ scanf(" %c", &guess); if(guess<solution){ printf("It is further in the back of the alphabet!\n"); } else if(guess>solution){ printf("Nope. You've gone too far!\n"); } } printf("Correct!\n"); } if((choice=='z')||(choice=='b')) break; } return 0; /* Example 1: "unwanted infinite loop" */ /* int j; for(j=0;j<100;j++){ if(j==42){ j=0; } } */ }
Output Example 1
x
Your letter was not a lowercase q!
Enter a letter!
q
Yay, a q!
Output Example 2 with char choice = z
.
For integers press z, for letters b.
g
What do you like better? Letters or integers?
For integers press z, for letters b.
z
Enter a prime!
4
Your number was no prime!
Enter a prime!
5
Thx!
Output Example 2 with char choice = z
.
For integers press z, for letters b.
b
Then guess my (lowercase) letter!
a
It is further in the back of the alphabet!
x
Nope. You’ve gone too far!
q
Correct!
I don’t think an explanation is really necessary here. The code should be self-explanatory. The most important pieces are summarized in the following paragraphs.
With break
I immediately break out of the loop. This way I can for whatever reason avoid the check of the break condition and further iterations.
With while(1){..}
I can create an endless loop on purpose. I must at some point (within the block delimited by the curly braces) break it with the break;
statement.
In some situations, endless loops that are being broken are an ugly, but easily implemented solution. Use as sparingly as possible and with great caution.
The third example is an unwanted infinite loop. Since while
and for
loops are equivalent and as the example shows for
loops can be endless loops as well. However, under normal circumstances, it is much easier to erroneously create an infinite loop with while
than with for
.
As theoretical insight shows, there cannot exist an algorithm with finite runtime that can determine whether a program terminates within finite or infinite time. Exercise: find a convincing argument.
switch-case
Code Example
int main(){ int wish; printf("Which fruit would you like to buy?\n"); printf("0:pears 1:strawberries 2:plums \n"); scanf("%d", &wish); switch(wish){ case 0: printf("Yummy pears!\n"); break; case 1: printf("Fresh strawberries!\n"); break; case 2: printf("Juicy plums!\n"); break; default: printf("Sadly, we do not have this fruit in our sortiment."); break; } return 0; }
Output for wish = 1
0:pears 1:strawberries 2:plums
1
Fresh strawberries!
In contrast to programming languages such as PHP, the argument of the switch
must be an integer. In some cases, it is advisable to use #defines
to mask these integers behind meaningful macro names. For the respective value of wish
a function is executed. If there’s no suitable case
the command behind default:
is executed.
switch-case
is significantly more efficient to use than if...else if...else
. However, don’t forget to break after each case
. Otherwise, the next case will be checked as well. Which in some cases may be useful, e.g. if in some case the switch
-variable value is increased, but this seems like a dangerous practice, which leads to harder-to-understand code. Avoid whenever possible.
Checklist: Control Structures
for
loops are mostly used for iterative algorithms with a known number of loop iterations.
while
loops are used to executed a sequence of commands until a break condition is met.
The code in the do-while
loop is executed at least once.
With break;
the loop is immediately canceled and the line after it will be executed.
With while(1){..}
I can create an endless loop on purpose. I must at some point (within the block delimited by the curly braces) break it with the break;
statement.
In some situations, endless loops that are being broken are an ugly but easily implemented solution. Use as sparingly as possible and with great caution.
There’s no algorithm with finite runtime that can determine whether a program determines within a limited amount of time.
In many cases switch-case
is a more efficient alternative to if...else if...else
. switch
needs an integer argument.
Arrays and Structs
Arrays
One-dimensional arrays can best be thought of as one-dimensional row vectors. Two-dimensional arrays correspond to matrices. Arrays of higher dimensions analogous.
Important to consider is the fact that arrays can only store data of a single data type. E.g. you can only store integers into an integer array.
Code Example
int main(){ char fruits[5][20] = {"Apple", "Pear", "Strawberries", "Peach", "Lemon"}; float prices[5] = {1.00, 1.50, 2.50, 2.00, 1.35}; int q = 1, curFruit = 0; printf("Welcome at the Buy-O-Tron!\n"); while(q==1){ printf("Enter the product number between one and five.\n"); scanf(" %d", &curFruit); printf("You have chosen %s. Price: %1.2f€\n", fruits[curFruit-1], prices[curFruit-1]); printf("Do you have another wish?\n If so press 1.\n"); scanf(" %d", &q); } return 0; }
Output for curFruit = 4
.
Enter a product number between one and five.
4
You have chosen peach. Price: 2.00€
Do you have another wish?
If so press 1.
0
Let’s take a look at the declaration of an array: Datatype name[length];
for one-dimensional arrays, Datatype name[rows][cols];
for two-dimensional arrays, analogous for arrays of higher dimensions.
In order to create an array of length $n$ one simply writes int array[n];
. Important to note is that the enumeration begins with 0. As a result, to access the first element of an array int arr[length]
you have to write int i = arr[0]
. Respectively, the $n^{\text{th}}$ element can be accessed like this: int i = arr[n-1];
You got to be careful about char
arrays, though. A char
is a character. You might have heard about strings
before, which seemed to be some sort of char
arrays. This is almost true in C, as the difference between an array of chars
and a string
is only that the latter is terminated by a binary zero. However, this might not be true for other programming languages. Generally speaking, strings are just an abstraction from chars
, saying nothing about their implementation.
One subtle thing to note is that it is due to that terminating binary zero at the end of each string that a string of length $n$ needs $n+1$ bits of memory.
Please also keep in mind that chars
are enclosed by single quotation marks as in char c = 'c';
, wheras strings are enclosed by double quotation marks as in char s[] = "c++";
. Note that the empty square brackets denote an array of unknown size. However, upon initilization the size of the array must be fixed, in this case it is 4.
Example: Let’s assume that I would like to store the two strings “Adam” and “Eva” into an array. Since strings are arrays of characters the desired array needs to be at least of dimension 2 to store both names. As a result, I need an array of at least the following size: char names[2][5];
. To know get the ‘m’ from “Adam” one would write int m = names[0][3]
.
Advantages and Disadvantages of Arrays
Advantages | Disadvantages |
|
|
Structs
A struct
is an abstraction of properties. With a struct
one can e.g. define points in the three-dimensional space. A point in the three-dimensional space is characterized by its three spatial coordinates. This is due to the fact that they provide a unique representation of said point. On this basis, one can construct geometrical objects, such as a sphere which can be uniquely defined with the center point and the radius.
An advantage of structs
is that with these abstract definitions a lot of functionality can be implemented without great hassle.
The insight that increasing level of abstraction leads to lower implementation effort and at the same time increased functionality motivates the programming paradigm of object-oriented programming in C++.
Code Example
#include <stdio.h> #include <math.h> typedef struct{ float x; float y; }point; typedef struct{ point center; float radius; }circle; float square(float x){ return x*x; } int numberOfIntersections(circle A, circle B){ float ax = A.center.x; float ay = A.center.y; float bx = B.center.x; float by = B.center.y; float dist = sqrt(square(ax-bx)+square(ay-by)); if((A.radius+B.radius)<dist) return 0; else if((A.radius+B.radius)==dist) return 1; else return 2; } int main(){ point P = {1.0, 2.0}; point Q = {2.0, 2.0}; circle A = {P, 3}; circle B = {Q, 1}; int intersections = numberOfIntersections(A, B); printf("The circles A and B have %d intersections", anzahl); return 0; }
Output
Declaration of structs
and typedef
A struct
can be declared, defined and “used” through various means. Here’s an explanatory code example with comments for that matter.
/* Nonuniform notations on purpose. This is to present all possible variations */ struct point{ float x, y; }P,Q; /*The points P and Q are declared here. */ typedef struct{ struct point center; float radius; }circle; /* With this typedef, a new data type "circle" is defined. With typedef, you can also define aliasing datatypes, such as typedef unsigned int uint. In such a setting the name of the struct becomes obsolete, nevertheless can be defined, resulting in two possible names for the same struct, also called aliasing (omitted for circle, for cylinder the name is Cylinder). To be clear: both struct Cylinder myCyl; and cylinder myCyl; are valid declarations. */ typedef struct Cylinder{ float height; circle base; }cylinder; /* Without typdef struct must not be omitted */ /* In C++ this is less cluttered, though... */ struct point O; /* declaration O */ O = {1, 1}; /* initialization O */ P = {2, 3}; /* initialization P */ /* With the .-operator one can access the attributes of the struct */ Q.x = 1; /* assignment Q.x = 1 */ Q.y = 2; /* assignment Q.y = 1 */ /* more examples */ cylinder W = {5.0, {{2.0,3.0},5.0}}; cylinder X, Y, Z; X.height = 4; X.base = {{0,0},5}; Y.height = 3; Y.base = {P, 2}; circle C = {Q, 2}; Z = {7, C}; /* Aren't these familiar ?! Hooray, pointers!*/ cylinder* cptr = &Z; /* What the hell is this ?*/ cptr->height = 5; /* cptr->height is a little abbreviation for (*cptr).height! "->" is called arrow operator */ /* Memorize this for abstract data types */
Checklist: Arrays and Structs
Arrays are vectors, respectively matrices (also 3d and higher dimensions), that can store data of a single data type. At runtime the size of the array is fixed, i.e. it is fixed.
Enumerating the elements of an array one starts to count at 0.
(In C) strings are char
-arrays terminated by a binary zero. String arrays are two-dimensional char
arrays respectively.
chars
are enclosed by single quotation marks ”. strings
are enclosed by double quotation marks “”.
structs
are abstract “user-defined” data structures, whose advantages motivated objected-oriented programming.
structs
can contain attributes of different data types.
With the dot operator, one can manipulate the attributes of structs
. With the arrow operator, one can manipulate structs
by applying it to a pointer (that points to a struct).
The arrow operator is an abbreviation for (*ptr).attr !
Functions
Functions comprise of a data type, a name, possibly input arguments, the function itself and (possibly) a return value.
Code Example
main.c
int main(){ return myFunction(5); } int myFunction(int myArgument){ if (myArgument == 5) return 2; return 1; }
Explanation
Firstly, main()
is called without input arguments. As return value, an integer is expected. The result (return value) of the myFunction()
with the input argument int myArgument = 5
will be main’s return.
Subsequently, let’s a look at myFunction()
. The return value of this function is integer and this indeed it must be, because the main()
function also returns a value of type integer. The input arguemnt is int myArgument
. If this argument is 5, 2 should be returned. Then, the function should immediately terminate without executing any following statements in that scope. The function would return 1 if myArgument
had any other value.
Important: It is crucial to know that the arguments of a function can be passed by value or by reference. In the case of passing by reference, a reference to the variable is passed to the function, i.e. a pointer to its memory address. So if we dereference that pointer we can directly the value of the original variable. In contrast to passing by reference, passing by value will produce a copy of the passed value and use this instead of (possibly) manipulating the variable itself.
Note: Functions of type void
do not expect return values. However, to instantly terminate them you can write return;
.
Note: If you want a function to have multiple return values you can either return a struct or nimbly work with calling/passing by reference. Which method is preferable is situational and often times just a matter of taste or personal preference.
Note: The data type of main()
can only be void
or int
in C, or only int
in C++.
Note: Functions (declarations) can be outsourced into so-called headers (file extension .h) to keep main.c
readable. The respective function definitions can be done in the header files, but ideally, they should be put into .c files with the same name as the header. The headers can be included into each .c file the same way as a library. This is simply accomplished by a simple #include
directive. Note that these headers are included with quotation marks (“”) instead of angle brackets.
C Standard Library
The C Standard Library is a normed collection of functions for each C standard. At this point, I will give a short introduction to the most important functions from several headers. Note that I will only introduce a very small portion of the functions.
Important note : Whenever possible one should resort to C Standard Library functions because they are generally implemented close to the maximum of possible efficiency and after all already implemented. However, for academical reasons it might be a good idea to try to build some of these functions by hand just to improve coding style and check on your own skills.
<stdio.h>
<stdio.h>
stands for standard input/output, consequently this header contains important functions for the input through external files or user interaction, as well as functions to output respective data.
In this short little introduction, we will only make use of the printf
as well scanf
.
With printf
we can output text such as “Text”. An example to clarify the syntax: printf("All my ducklings swim on the lake, they are %d.", numberOfDucklings);
. In this case the only thing we do not know yet is what %d
(format specifier) means. For this purpose a small table.
Format Specifier | Meaning |
%d | Integer (decimal) |
%f | Floating point number (float) |
%e | Scientific notation (exponent) |
%c | Character |
%s | String |
%<NUMBER>d | <NUMBER> states the amount of space reserved for the display. Unused space is filled up with space characters. |
%0<NUMBER>d | <NUMBER> states the amount of space reserved for the display. Unused space is filled up with space characters. |
%.<NUMBER>d | <NUMBER> states the precision, which means the number of decimals displayed. |
scanf
uses the same format and the same format specifiers as printf
. Note, that scanf
needs the memory address of the variable into which the user input is to be stored. Strings are exceptional because those are implemented as pointers to binary zero terminated char arrays in C. Example: scanf("%d", &favouriteNumber);
<stdlib.h>
<stdlib.h>
stands for standard library. Among other things it contains important functions for dynamic memory management, search, converting, random and environmental functions.
In this introduction we will employ the functions malloc
and free
from this header. In advanced parts of this series we will also take a look at bsearch
and qsort
.
With malloc
one can dynamically allocate (reserve) memory on the so-called heap. With the heap, one can implement dynamic data structures such as lists, vectors, maps and more. In this context, dynamic means that space is reserved on-the-fly at runtime. We can insert and delete elements into these structures at leisure. This happens without waste of memory or the danger of program crash due to memory access violations. malloc()
returns a void pointer to the created object. To get a different type of pointer one can explicitly typecast: int* myIntptr = (int*)malloc(sizeof(int));
.
Respectively, we release allocated memory with the free()
command. It is a very good practice to tidy up the mess you created. This is useful for both saving memory that might be needed and also to later avoid unexpected inexplicable errors.
<math.h>
<math.h>
is a header containing numerous mathematical functions, including but not limited to trigonometrical, power and exponential functions, as well as rounding and comparing functions.
In this series we are using the functions sqrt
, pow
and floor
.
<string.h>
<string.h>
helps us manipulating strings and arrays. Things we can do include comparing strings, append strings to one another and searching for strings and substrings.
In this series, we won’t look at any functions of this headers, despite their integral importance.
Checklist: Functions
Functions comprise of a data type, a name, possibly input arguments, the function implementation itself and exactly one return value where applicable.
C programmers have access to the time-tested and vast C Standard Library. The contained functions are very useful and highly efficient. One should resort to these whenever possible.
Tremendously important functions for us include: the printf
family, scanf
, malloc
, free
, sqrt
, abs
, floor
.
Interesting math and random functions include: sin
, cos
, tan
, asin
, acos
, atan2
, pow
, rand
, srand
Runtime Analysis
What is runtime?
The runtime of a program denotes the time that a program needs for execution depending on certain input variables. For the analysis of abstract algorithms, it turned out that it often times is better to measure the number of steps (e.g. number of multiplications) needed to compute the result instead of the raw time. The reason for this is that the runtime in its original sense is hardware-dependent and hardly comparable across different platforms. Different operations like multiplication and summation might have different “prices” on different hardware.
The Landau- or O-Notation
Due to the fact that many problems have sizeable input arguments, it makes sense to express runtime in some kind of polynomial. In most, but not all cases constants are neglected in these considerations for two main reasons. At the one hand, it often is hard to determine these constants for all kinds of hardware. On the other hand, many of these constants are negligible once the runtime parameter grows. However, for if that parameter is small and you have “cheap” hardware these constants might have to be taken into account, perhaps complicating things a lot.
However, if you try to assess the quality of two concurrent algorithms you most of the time only look at the highest exponent of the runtime polynomial, provided that it can be expressed in polynomial form. In other words, we “estimate like engineers” and neglect small influences, i.e. exponents. What is left is an order, the order of a polynomial. We denote $ n^3+2n-27 \in O\left(n^3\right) $.
A healthy understanding of scales lets us intuitively find the faster, usually the better algorithm for our purpose. This ultimately leads to very simple and intuitive computation rules for the O-notation. The “difficult” part is to extract a mathematical model describing runtime out of the source code.
Checklist: A Few Calculation Rules for The O-Notation
- $ O\left(n^x+n^{x-1}+\dots +1\right) = O\left(n^x\right) $ $\forall x \in \mathbb{R} $
- $ O\left(c\right) = O\left(1\right) $ $\forall c \in \mathbb{R} $
- $ O\left(cA\right) = O\left(A\right) $ $\forall c \in \mathbb{R} $
- $ O\left(A\right)+O\left(B\right) = O\left(A+B\right) $
- $ O\left(A\right)\cdot O\left(B\right) = O\left(A\cdot B\right) $
- $ O\left(n^x\right) < O\left(2^n\right) < O\left(n!\right) $
- $ O\left(O\left(A\right)+O\left(B\right)\right) = O\left(A\right) \Leftrightarrow O\left(A\right) > O\left(B\right) $
Code Example (not executable)
Exercise: Determine the runtime of main()
in dependency of m and n.
int main(){ /* In real programs k, l, m, n have actual values. */ int j, k, l, m, n; for(j=0;j<n;j++){ funcA(k,l,m); /* Sei Laufzeit hier k*l+2m */ funcB(l,m); /* Sei Laufzeit hier l^2*m */ funcC(n) /* Sei Laufzeit hier 5n^2+m^2*/ } }
Solution
Let $t$ denote the total runtime.
Let $f$ denote the runtime of the for loop.
Let $a,b,c$ denote the runtimes of the respective functions.
The for
loop is traversed through $n$ times. $ \Leftrightarrow t = n\cdot f $
The functions run are in sequence, thus:
$ \Leftrightarrow t = n\cdot (a+b+c) $
Plugging it in:
$ \Leftrightarrow t = n\cdot (k\cdot l+2m+l^2 \cdot m+5n^2+m^2) $
Note, that the question is asking for the runtime regarding the input arguments $m$ and $n$:
$ \Leftrightarrow t \in O\left( n\cdot \left(1+2m+m+5n^2+m^2\right)\right) $
Let’s now make use of the calculation rules of the O-Notation:
Exercise: Execute each of the steps on a piece of paper and write down the rules you have used. This is a very common task in various exams.
$$ \begin{aligned}
& O\left(n\cdot \left(1+2m+m+5n^2+m^2\right)\right) & = \quad & O\left(n\cdot \left(n^2+m^2\right)\right) \\
\quad \Leftrightarrow \quad & O\left(n\cdot \left(n^2+m^2\right)\right) & = \quad & O\left(n^3+m^2n\right) \\
\end{aligned}
$$
where we can find c basic?
This post is part of a series. The links to parts two and three are provided in the sidebar 🙂
Well, once they are ready that is.