Atomicity and Atomic Instructions in Linux

Introduction #

This is my understanding of Synchronization concepts in the Linux Kernel.

The need for synchronization? #

Imagine that there are multiple threads that are trying to write to a global/shared data at a given point in time. Which thread is going to update the data first? What happens to shared data in case of a race condition?

So how do we guarantee that in the case of concurrent execution the writable shared data is accessed by only one thread?

Any piece of code that can execute in parallel and has writable shared data is called a Critical Section. This critical section has to be run either atomically i.e w/o interruption or exclusively by one thread at any given point. We require synchronization because threads can concurrently execute Critical Section where writable shared data is being worked upon. Even though other parts of the code can run in parallel we need to ensure that the code inside the Critical Section is serialized. To achieve this we use locks. When we use locks we need to ensure that there is just one key. So whenever multiple threads compete for it, only one thread will get the lock. And what will happen to the threads that didn’t get the lock? It depends on weather the lock is a blocking or a non blocking call. If in case of a blocking call, the other threads would have to wait for the lock to unlock.

So if the Critical Section needs to be serialized, doesn’t that defeat the purpose of parallelism? It sure does! but we need to design such that there is minimal locking in the code.

Lets take a look at the following example:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/delay.h>
#include <linux/atomic.h>

volatile unsigned int counter=0;
struct task_struct *t1, *t2, *t3;

int Kthread_start(void *arg){
	int i;
	int local ;
	for(i=0; i<1000; i++){
		local = counter;
		local = local +1;
		counter = local;

	}
	pr_info("counter value %d\n", counter);
	pr_info("Thread operation complete by %s\n", (char*)arg);
	return 0;
}


int kthread_init(void){
	t1 = kthread_create(Kthread_start,"thread1", "Kthread1");
	if(IS_ERR(t1)){
		pr_err("Error starting kernel thread 1\n");
	}

	t2 = kthread_create(Kthread_start,"thread2", "Kthread2");
	if(IS_ERR(t2)){
		pr_err("Error starting kernel thread 2\n");
	}
	t3 = kthread_create(Kthread_start,"thread3", "Kthread3");
	if(IS_ERR(t3)){
		pr_err("Error starting kernel thread 3\n");
	}

	wake_up_process(t3);
	wake_up_process(t1);
	wake_up_process(t2);


	return 0;
}

void kthread_exit(void){

	pr_info("Final counter %d\n",counter);
	pr_info("exiting module\n");
}

module_init(kthread_init);
module_exit(kthread_exit);

MODULE_AUTHOR("Mukunda Aprameya <mail@amukunda.com>");
MODULE_DESCRIPTION("Kernel atomic macros");
MODULE_LICENSE("GPL");

So what does running atomically mean? #

Atomicity is a single operation that is indivisible, i.e they cannot be interrupted and will run to completion. The macros for atomic instructions are available under /include/linux/atomic.h. This is basically a wrapper. Atomic instructions are platform dependent; so a platform supporting atomic instructions, has its own atomic.h file. Lets dive a little deeper and take a look at atomic.h file in the linux kernel. This is a header file containing the atomic instructions for ARM processors. Similarly you can also find the atomic.h file for x86 under arch/x86/include/asm. These are machine op codes. The examples of these atomic instructions are:

fetch-and-add: atomic increment
test-and-set: set a value at memory location and return previous state
compare-and-swap: modify the contents of a memory location only if the previous content is equal to given value

These atomic instruction the base of all locking/synchronization mechanisms in the kernel.

Linux provides two API’s

  • Integer atomic operation
  • Bitwise atomic operation

How do we make use of these instructions? #

We can make use of these instruction by first declaring the shared data as atomic_t. Example static atomic_t i = ATOMIC_INIT(0);.

Lets modify the above code to use atomic counter and increment the counter using the atomic_inc function.

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/delay.h>
#include <linux/atomic.h>

static atomic_t counter = ATOMIC_INIT(0);
struct task_struct *t1, *t2, *t3;

int Kthread_start(void *arg){
	int i;
	int local ;
	for(i=0; i<1000; i++){
		atomic_inc(&counter);
	}

	pr_info("counter value %d\n", atomic_read(&counter));
	pr_info("Thread operation complete by %s\n", (char*)arg);
	return 0;
}


int kthread_init(void){
	t1 = kthread_create(Kthread_start,"thread1", "Kthread1");
	if(IS_ERR(t1)){
		pr_err("Error starting kernel thread 1\n");
	}

	t2 = kthread_create(Kthread_start,"thread2", "Kthread2");
	if(IS_ERR(t2)){
		pr_err("Error starting kernel thread 2\n");
	}
	t3 = kthread_create(Kthread_start,"thread3", "Kthread3");
	if(IS_ERR(t3)){
		pr_err("Error starting kernel thread 3\n");
	}

	wake_up_process(t3);
	wake_up_process(t1);
	wake_up_process(t2);


	return 0;
}

void kthread_exit(void){

	pr_info("Final counter %d\n",atomic_read(&counter));
	pr_info("exiting module\n");
}

module_init(kthread_init);
module_exit(kthread_exit);

MODULE_AUTHOR("Mukunda Aprameya <mail@amukunda.com>");
MODULE_DESCRIPTION("Kernel atomic macros");
MODULE_LICENSE("GPL");

So can we use this declaration for any and every variable? #

The caveat here being we can declare this only for counters, and the size of the counter is also to be taken into consideration here. The type.h casts two counters. An integer counter and a long counter as atomic_t and atomic64_t respectively.

Taking a look at arm64’s atomic.h file we see number of functions which perform atomic operations add, sub fetch etc. If we see the macro ATOMIC_FETCH_OP calls __lse_ll_sc_body.


#define __lse_ll_sc_body(op, ...)/
({/
	system_uses_lse_atomics()?/
		__lse_##op(__VA_ARGS__):/
		__ll_sc_##op(__VA_ARGS__);/
})

Having a look at atomic_lse.h

So what we do is use a counter and build an api around that counter, the counter value indicating locked if its 0 and unlocked if its 1. Since we don’t allow the counter to go beyond 1, we call this as a binary counter. Without atomic instructions there is no mutex, semaphore or spinlock(which we will look into later) or any other locking api. What you call as a lock is nothing but a atomic 32 or 64 bit counter.

Locking function can be of two types: poll and wait(sleep) The poll implementations are called spinlocks(spin implementations), where as the wait implementations(sleep) are called mutexes(wait implementations) or binay semaphore.

So the programmer has to choose between the above two mentioned implementations.