Thoughts of a programmer

Threading Explained

Normally the programs that you write are single-threaded. By single-threaded, I mean that the processor starts at the first instruction in the program and then executes them one-by-one in the sequential manner, of course with some branching here and there, until eventually it reaches the last instruction in the program and the program ends.

But there are cases when you need to somehow do two or more pieces of work at the same time or do them independent of other pieces. Let’s suppose you are building a desktop application which has two buttons, Start and Cancel. Pressing on Start would start some long running task and pressing Cancel should cancel that long running task.

If you try to build the above application in the normal way (single-threaded way), you would see the main drawback of the single threaded model of execution. When the user presses on Start, the long running task would start running and since the whole appplication is running in a sequential manner, your application freezes until that task is finished. And this means, user won’t be able to cancel the task by pressing Cancel button.

This is one of the case, where threads can help you. But before understanding what threads are, you need to understand what is a Process.

A Process is an instance of your program while it is running. Your program instructions residing in some file on the hard disk doesn’t constitute a Process. When that program is actually launched, then a Process is created for your program by the operating system.

Process mainly consists of following things

Stack
[Not supported by viewer]
Heap
[Not supported by viewer]
Code
[Not supported by viewer]
Global Data
<span style="font-size: 12px">Global Data</span>
File Descriptors
Program Counter
Registers
Permissions
[Not supported by viewer]

Without threads, Process is the basic unit of execution that an operating system can schedule to run at a moment of time. When there are multiple Processes, the operating system allocates each Process just a quantum of time and keeps doing the switch from one Process to the another. To make this switching possible, the operating system saves the current state of the Process in an structure called Process Control Block which has basically the same information described above. This enables to restore the state of the Process when it is later rescheduled by the operating system and to resume from the same point where it left off.

Thread

Threads allows us to have multiple concurrent flow of execution in a single Process. With threads, we have one more level of scheduling that the operating system can do. And that is, which thread out of all the threads in the Process to execute at that moment.

When you launch a program, the operating system creates a Process for that program with all the information as described above, and starts the execution of the instructions. This is the main thread of execution.

Now what you can do is, you can create a new thread in the same program. A thread will have following things,

Thread Stack
<span style="font-size: 12px">Thread Stack</span>
Heap
[Not supported by viewer]
Code
[Not supported by viewer]
Global Data
<span style="font-size: 12px">Global Data</span>
File Descriptors
Permissions
[Not supported by viewer]
Program Counter
Registers
[Not supported by viewer]
Shared 
Across Threads
[Not supported by viewer]
Thread 
Specific
[Not supported by viewer]

As you can see, most of the things thread shares with the Process. You can think of the Process as the unit of resource allocation for a program and Thread as the unit of execution. Process has the resources such as memory, file descriptors and also other things such as code to be executed and permissions. All the threads that you will create will share these things with the Process but will have some additional things of their own.

If you are familiar with the fork function in C language, threads are not the same things. fork command is used to create child Processes of the current Process. The things that you can do with threads, you can also do them with these child Processes, but creating these child Processes and doing context switching between them is way more expensive than creating threads and doing context switching among threads.

The reason for this is very simple -

A very minimal multithreaded program

Enough with the theory. Let’s see a working demo of multithreaded program.

#include <pthread.h>
#include <stdio.h> 

int num_threads = 10;
int thread_num[10];

void* thread_function(void* arg) {
    int t= *(int *)(arg);
    printf("Hi from thread %d\n", t);
}

int main(void) {
    pthread_t threads[num_threads];
    for (int i = 0; i < num_threads; i++) {
        thread_num[i] = i;
        if (pthread_create(&threads[i], NULL, thread_function, &thread_num[i])) {
            printf("Error in creating thread %d\n", i);
        }
    }
    for (int i = 0; i < num_threads; i++) {
        if(pthread_join(threads[i], NULL)) {
            printf("Error in joining thread %d\n", i);
        }
        
    }
    return 0;
}

Run the above program like this,

gcc thread.c -lpthread

thread.c is the name of the file, and pthread is the library which has the actual object files for all the functions. You would get something like the following as the output,

Hi from thread 0
Hi from thread 1
Hi from thread 5
Hi from thread 3
Hi from thread 6
Hi from thread 2
Hi from thread 4
Hi from thread 7
Hi from thread 8
Hi from thread 9

As it must be obvious from the output above, the different threads executes in different order and you can never predict their order. And sometimes this can lead to some very hard to find bugs in your program.

Non-exclusive access to shared object

Consider this code,

#include <pthread.h>
#include <stdio.h>
int x = 0;
void* thread_function(void* arg) {
    for (int c = 0; c < 1000000; c++) {
        int i = x;
        i = i + 1;
        x = i;
    }
}
int main(void) {
    pthread_t thread;
    if (pthread_create(&thread, NULL, thread_function, NULL)) {
        printf("Error in creating thread \n");
    }
    for (int c = 0; c < 1000000; c++) {
        x = x + 1;
    }
    if(pthread_join(thread, NULL)) {
        printf("Error in joining thread\n");
    }
    printf("Final value of x is %d", x);
    return 0;
}

When you run this program, there are chances that you will see different value of x each time you run it. On my machine, I got the following results on 5 execution of these program.

Final value of x is 1161394
Final value of x is 1100001
Final value of x is 1062798
Final value of x is 1205648 
Final value of x is 1533644

Suppose the current value of variable x is 10. Then the second thread executes the instruction int i = x and stores the value 10 in i. But then this thread get context switched with the first thread, and the first thread executes the instruction x = x + 1 and increments the value of x to 11. Now when the second thread comes back, it will increment i to 11 and store these value in variable x, replacing the old value of x.

The reason this happenend, is non exclusive manipulation of the global variable x by the two threads. Reading the value of x and writing to it, is not an atomic operation. So, it may happen that after reading the value of x by one thread, some other thread may modify this value of x before the first thread gets the chance to save it’s value.

Locking

To solve this problem, we will have to use locks.

#include <pthread.h>
#include <stdio.h>

int x = 0;
pthread_mutex_t x_mutex;

void* thread_function(void* arg) {
    for (int c = 0; c < 1000000; c++) {
        pthread_mutex_lock(&x_mutex);
        int i = x;
        i = i + 1;
        x = i;
        pthread_mutex_unlock(&x_mutex);
    }
}

int main(void) {
    pthread_t thread;

    pthread_mutex_init(&x_mutex, NULL);

    if (pthread_create(&thread, NULL, thread_function, NULL)) {
        printf("Error in creating thread \n");
    }
    for (int c=  0; c < 1000000; c++) {
        pthread_mutex_lock(&x_mutex);
        x = x + 1;
        pthread_mutex_unlock(&x_mutex);
    }
    if(pthread_join(thread, NULL)) {
        printf("Error in joining thread\n");
    }
    pthread_mutex_destroy(&x_mutex);
    printf("Final value of x is %d", x);
    return 0;
}

Properly dealing with mutexes and locks in your program can be a very challenging task. It requires some experience and good understanding of your program flow. Keep practicing, and it won’t be daunting task one day.

Before I finish this article, I just wanted to share one more example of using threads.

Implementing Merge Sort using threads

Below is the program which uses threads for doing merge sort. Feel free to go over the code. You should be able to understand the code as I have covered almost the functions in above examples.

#include <pthread.h>
#include <stdio.h>

typedef struct bound {
    int low;
    int high;
} bound;

int a[] = {5,6,1,2,3,7,4,8,9,0,11,10};
int size = 12;

void* merge_sort(void* arg) {
    bound limits = *((bound *)arg);
    int low = limits.low;
    int high = limits.high;

    // If the size is one, don't do anything
    if (low == high) {
        return;
    }
    // If the size of the array is 2 or more
    // Then break the range into two parts

    int mid = (low + high) / 2;

    // Create two threads to assign the task of 
    // sorting the two halfs of the array

    bound left = {.low=low, .high=mid};
    bound right = {.low=mid+1, .high=high};

    pthread_t left_thread, right_thread;
    
    if (pthread_create(&left_thread, NULL, merge_sort, &left)) {
        printf("Error in creating the thread at point %d %d\n", low, mid);
    }
    if (pthread_create(&right_thread, NULL, merge_sort, &right)) {
        printf("Error in creating the thread at point %d %d\n", mid+1, high);
    }
    
    // Wait for both the threads to complete
    pthread_join(left_thread, NULL);
    pthread_join(right_thread, NULL);

    // Now combine the two sorted sub-halfs into sorted array
    // Note: This can be done inplace also.
    
    //Create a temporary array to hold the sorted array
    int temp_size = high - low + 1;
    int temp[temp_size];

    int left_index = low;
    int right_index= mid+1;
    for (int i = 0; i < temp_size; i++) {
        if (left_index <= mid && right_index <= high) {
            if (a[left_index] < a[right_index]) {
                temp[i] = a[left_index];
                left_index++;
            } else {
                temp[i] = a[right_index];
                right_index++;
            }
        } else {
            if (left_index <= mid) {
                temp[i] = a[left_index++];
            } else {
                temp[i] = a[right_index++];
            }
        }
    }

    // Copy the temp array to the original array
    for (int i = 0; i < temp_size; i++) {
        a[low+i] = temp[i];
    } 
}

int main(void) {
    pthread_t mythread;
    bound initial = {.low=0, .high=size-1};

    // Call merge sort on the entire array
    merge_sort(&initial);

    // Print the sorted array
    for (int i = 0 ; i < size; i++) {
        printf("%d ", a[i]);
    }
    return 0;
}

Thank you for reading my article. Let me know if you liked my article or any other suggestions for me, in the comments section below. And please, feel free to share :)