Th.Oughts : Linked list design in the Linux kernel

Recently, a friend of mine asked :

Why does the struct list_head in the Linux kernel has no data field ?

In other words, what he means is this:

The way we are introduced to linked lists in high school is

struct linked_list {
void *data;
struct linked_list *prev *next;
}

And assume, we have a user defined data structure

struct user_data {
int a;
char b;
}

The approach we usually use to create a list would be

struct linked_list head;
struct user_data mydata1;
head.data = (void *)&mydata1;
head.next = NULL;
head.prev = NULL;

And to add another node to the list

struct linked_list newentry;
struct user_data mydata2;
newentry.data = (void *)&mydata2;
newentry.prev = &head;
newentry.next = NULL;

and so on..

However in the Linux kernel, a linked list structure is defined something like

struct list_head {
struct list_head *next, *prev;
}

and to do the same thing as we did above, we do something like

struct user_data {
int a;
char b;
struct list_head new_list;
}

/* Create head */
struct user_data head;
head.new_list.prev=NULL;
head.new_list.next=NULL; ...

/* Add a new node */

struct user_data newentry;
newentry.new_list.prev=&head.new_list;
head.new_list.next = &newentry.new_list; ...

So, in one case we have the data embedded into the linked list (high school style) and in the other, we have the linked list embedded into data. So, what's the advantage of the second over the first ?

I actually couldn't find any documentation justifying this design but these obvious reasons come to my mind.

One advantage is that the kernel programmer is relieved from the additional care he has to take during typecasting if she chooses the first approach. As we all know, typecasting is a necessary evil in C, but using it for linked lists that's invariably used almost everywhere in the kernel is likely to double the number of kernel bugs!

Second, with the first case, with each node created, we end up using more space compared to the second approach. This is because struct linked_list has an additional void *data defined thereby making the node larger in size. This is definitely an important issue in embedded systems if not on x86.

Last but not the least, an important advantage that this design offers is flexibility in list walking. For example, if you have the address of newentry (see above example), you can access newentry->new_list and then go back and forth. Then, using the container_of() macros, any of the associated user_data for any node could be accessed! Even if we just have the address of list_head somewhere in the middle of a linked list, we could jump to the associated data structure. With the first design, in order to do list walking and modifications, we always need the address of any of the node. This means that you always have to pass struct linked_list pointers even if all you really wanted to do is manipulate struct user_data.

Comments