Threading Explained

Written on March 11th, 2017 by Kishu Agarwal

Normally the programs that you write are single-threaded. By single-threaded, I mean that the processor starts at the first instruction in the program and then executes them one-by-one in the sequential manner, of course with some branching here and there, until eventually it reaches the last instruction in the program and the program ends.

But there are cases when you need to somehow do two or more pieces of work at the same time or do them independent of other pieces. Let’s suppose you are building a desktop application which has two buttons, Start and Cancel. Pressing on Start would start some long running task and pressing Cancel should cancel that long running task.

If you try to build the above application in the normal way (single-threaded way), you would see the main drawback of the single threaded model of execution. When the user presses on Start, the long running task would start running and since the whole appplication is running in a sequential manner, your application freezes until that task is finished. And this means, user won’t be able to cancel the task by pressing Cancel button.

This is one of the case, where threads can help you. But before understanding what threads are, you need to understand what is a Process.

A Process is an instance of your program while it is running. Your program instructions residing in some file on the hard disk doesn’t constitute a Process. When that program is actually launched, then a Process is created for your program by the operating system.

Process mainly consists of following things

Instructions to execute
The global variables and constants in the program
Stack for managing the function calls
Heap for dynamically allocating the memory
File descriptors for the open files and sockets
Permissions like the User Id, Group Id, etc
State of the program execution like the program counter and the registers

Without threads, Process is the basic unit of execution that an operating system can schedule to run at a moment of time. When there are multiple Processes, the operating system allocates each Process just a quantum of time and keeps doing the switch from one Process to the another. To make this switching possible, the operating system saves the current state of the Process in an structure called Process Control Block which has basically the same information described above. This enables to restore the state of the Process when it is later rescheduled by the operating system and to resume from the same point where it left off.

Thread

Threads allows us to have multiple concurrent flow of execution in a single Process. With threads, we have one more level of scheduling that the operating system can do. And that is, which thread out of all the threads in the Process to execute at that moment.

When you launch a program, the operating system creates a Process for that program with all the information as described above, and starts the execution of the instructions. This is the main thread of execution.

Now what you can do is, you can create a new thread in the same program. A thread will have following things,

As you can see, most of the things thread shares with the Process. You can think of the Process as the unit of resource allocation for a program and Thread as the unit of execution. Process has the resources such as memory, file descriptors and also other things such as code to be executed and permissions. All the threads that you will create will share these things with the Process but will have some additional things of their own.

Each thread will have a local thread stack. Since each thread is a own flow of execution in a program, each thread needs its own stack to keep track of the call stack.
Each thread also needs to store its state. State includes the contents of the registers and the program counter when that thread was put in paused state.

If you are familiar with the fork function in C language, threads are not the same things. fork command is used to create child Processes of the current Process. The things that you can do with threads, you can also do them with these child Processes, but creating these child Processes and doing context switching between them is way more expensive than creating threads and doing context switching among threads.

The reason for this is very simple -

When creating a child Process, whole copy of the parent Process is created, and that includes the file descriptors, permissions, code to be executed, etc. The child Process is a whole life bearing Process in itself, like it’s parent Processs. But while creating a thread, since each thread shares most of the things with the Process, it needs to create only the local thread stack and the state, which is very cheap.
Doing a context switch on a Process takes much more time as compared to context switch on a Thread. The reason being, process has to save more information as compared to the thread.

A very minimal multithreaded program

Enough with the theory. Let’s see a working demo of multithreaded program.

#include <pthread.h>
#include <stdio.h> 

int num_threads = 10;
int thread_num[10];

void* thread_function(void* arg) {
    int t= *(int *)(arg);
    printf("Hi from thread %d\n", t);
}

int main(void) {
    pthread_t threads[num_threads];
    for (int i = 0; i < num_threads; i++) {
        thread_num[i] = i;
        if (pthread_create(&threads[i], NULL, thread_function, &thread_num[i])) {
            printf("Error in creating thread %d\n", i);
        }
    }
    for (int i = 0; i < num_threads; i++) {
        if(pthread_join(threads[i], NULL)) {
            printf("Error in joining thread %d\n", i);
        }
        
    }
    return 0;
}

For working with threads, you will have to import the pthread.h header file. This file contains the declarations related to all the thread related functions.
In the main function, I have created a array of type pthread_t, which is nothing but an integer denoting the id of a thread.
Then we loop 10 times, and create 10 threads using the pthread_create function. This function accepts four arguments. First is the memory address where the id of the thread being created has to be stored. Here we pass the address of the corresponding position in the array thread_num for that thread. Second argument is some attributes for how to create the thread. We will pass NULL to accept the defaults. In third, we have to pass the function which will be executed by the thread that is being created. We pass thread_function here. Fourth argument is data that we can pass to the function that the thread will execute. Here we pass the number of the thread as the argument.
pthread_create function will return 0 when success, and an non-zero value when there is some error in creating the thread. That’s why, I have wrapped that call in an if, to capture the error.
When a thread is created, the new thread will execute the function given to it, while the current thread will move on to the next instruction. So, here the new threads will be running the thread_function while the main thread, will continue running the loop, creating new threads.
After creating all the threads, the main thread then calls pthread_join on each thread that was created. pthread_join function is used to join another thread to the current thread. It is a blocking call, so until the other thread is not finished, the current thread will be blocked. It is always nice to clean up all the threads that you have created in a program this way.
pthread_join accepts two arguments. First is the thread id to which to wait for. Second is the memory address where to store the value returned the other thread. Here we pass NULL as we don’t have any value returned by the thread_function.

Run the above program like this,

gcc thread.c -lpthread

thread.c is the name of the file, and pthread is the library which has the actual object files for all the functions. You would get something like the following as the output,

Hi from thread 0
Hi from thread 1
Hi from thread 5
Hi from thread 3
Hi from thread 6
Hi from thread 2
Hi from thread 4
Hi from thread 7
Hi from thread 8
Hi from thread 9

As it must be obvious from the output above, the different threads executes in different order and you can never predict their order. And sometimes this can lead to some very hard to find bugs in your program.

Non-exclusive access to shared object

Consider this code,

#include <pthread.h>
#include <stdio.h>
int x = 0;
void* thread_function(void* arg) {
    for (int c = 0; c < 1000000; c++) {
        int i = x;
        i = i + 1;
        x = i;
    }
}
int main(void) {
    pthread_t thread;
    if (pthread_create(&thread, NULL, thread_function, NULL)) {
        printf("Error in creating thread \n");
    }
    for (int c = 0; c < 1000000; c++) {
        x = x + 1;
    }
    if(pthread_join(thread, NULL)) {
        printf("Error in joining thread\n");
    }
    printf("Final value of x is %d", x);
    return 0;
}

When you run this program, there are chances that you will see different value of x each time you run it. On my machine, I got the following results on 5 execution of these program.

Final value of x is 1161394
Final value of x is 1100001
Final value of x is 1062798
Final value of x is 1205648 
Final value of x is 1533644

Suppose the current value of variable x is 10. Then the second thread executes the instruction int i = x and stores the value 10 in i. But then this thread get context switched with the first thread, and the first thread executes the instruction x = x + 1 and increments the value of x to 11. Now when the second thread comes back, it will increment i to 11 and store these value in variable x, replacing the old value of x.

The reason this happenend, is non exclusive manipulation of the global variable x by the two threads. Reading the value of x and writing to it, is not an atomic operation. So, it may happen that after reading the value of x by one thread, some other thread may modify this value of x before the first thread gets the chance to save it’s value.

Locking

To solve this problem, we will have to use locks.

#include <pthread.h>
#include <stdio.h>

int x = 0;
pthread_mutex_t x_mutex;

void* thread_function(void* arg) {
    for (int c = 0; c < 1000000; c++) {
        pthread_mutex_lock(&x_mutex);
        int i = x;
        i = i + 1;
        x = i;
        pthread_mutex_unlock(&x_mutex);
    }
}

int main(void) {
    pthread_t thread;

    pthread_mutex_init(&x_mutex, NULL);

    if (pthread_create(&thread, NULL, thread_function, NULL)) {
        printf("Error in creating thread \n");
    }
    for (int c=  0; c < 1000000; c++) {
        pthread_mutex_lock(&x_mutex);
        x = x + 1;
        pthread_mutex_unlock(&x_mutex);
    }
    if(pthread_join(thread, NULL)) {
        printf("Error in joining thread\n");
    }
    pthread_mutex_destroy(&x_mutex);
    printf("Final value of x is %d", x);
    return 0;
}

Here we use mutexes, for locking. A mutex variable is kind of lock which protects some resource from simulataneous access and modification. The way to use mutexes is, a thread at one moment will lock or own the mutex variable when trying to access some shared variable, and all other threads then will have to wait for this thread to unlock the mutex variable, before they can do anything with that shared variable. Thus, the mutex variable acts a gatekeeper for that shared resource and only one thread is allowed to enter the gate at one moment.
First we define a mutex variable using pthread_mutex_t and initialize it in main function using pthread_mutex_init function passing in the address of the mutex variable. The second argument is used for setting some extra attributes for the mutex variable, which we don’t need right now, so just pass in NULL.
Main functions are pthread_mutex_lock and pthread_mutex_unlock. Before we access the x variable, we first acquire the lock on the mutex, and after we are done with x, we unlock the mutex, so the other thread can now access the x.
After we are done with the mutex, we free the mutex using pthread_mutex_destroy function, passing in the address of the mutex variable.

Properly dealing with mutexes and locks in your program can be a very challenging task. It requires some experience and good understanding of your program flow. Keep practicing, and it won’t be daunting task one day.

Before I finish this article, I just wanted to share one more example of using threads.

Implementing Merge Sort using threads

Below is the program which uses threads for doing merge sort. Feel free to go over the code. You should be able to understand the code as I have covered almost the functions in above examples.

#include <pthread.h>
#include <stdio.h>

typedef struct bound {
    int low;
    int high;
} bound;

int a[] = {5,6,1,2,3,7,4,8,9,0,11,10};
int size = 12;

void* merge_sort(void* arg) {
    bound limits = *((bound *)arg);
    int low = limits.low;
    int high = limits.high;

    // If the size is one, don't do anything
    if (low == high) {
        return;
    }
    // If the size of the array is 2 or more
    // Then break the range into two parts

    int mid = (low + high) / 2;

    // Create two threads to assign the task of 
    // sorting the two halfs of the array

    bound left = {.low=low, .high=mid};
    bound right = {.low=mid+1, .high=high};

    pthread_t left_thread, right_thread;
    
    if (pthread_create(&left_thread, NULL, merge_sort, &left)) {
        printf("Error in creating the thread at point %d %d\n", low, mid);
    }
    if (pthread_create(&right_thread, NULL, merge_sort, &right)) {
        printf("Error in creating the thread at point %d %d\n", mid+1, high);
    }
    
    // Wait for both the threads to complete
    pthread_join(left_thread, NULL);
    pthread_join(right_thread, NULL);

    // Now combine the two sorted sub-halfs into sorted array
    // Note: This can be done inplace also.
    
    //Create a temporary array to hold the sorted array
    int temp_size = high - low + 1;
    int temp[temp_size];

    int left_index = low;
    int right_index= mid+1;
    for (int i = 0; i < temp_size; i++) {
        if (left_index <= mid && right_index <= high) {
            if (a[left_index] < a[right_index]) {
                temp[i] = a[left_index];
                left_index++;
            } else {
                temp[i] = a[right_index];
                right_index++;
            }
        } else {
            if (left_index <= mid) {
                temp[i] = a[left_index++];
            } else {
                temp[i] = a[right_index++];
            }
        }
    }

    // Copy the temp array to the original array
    for (int i = 0; i < temp_size; i++) {
        a[low+i] = temp[i];
    } 
}

int main(void) {
    pthread_t mythread;
    bound initial = {.low=0, .high=size-1};

    // Call merge sort on the entire array
    merge_sort(&initial);

    // Print the sorted array
    for (int i = 0 ; i < size; i++) {
        printf("%d ", a[i]);
    }
    return 0;
}

Thank you for reading my article. Let me know if you liked my article or any other suggestions for me, in the comments section below. And please, feel free to share :)

Feel free to share!