May 4, 2004
64-bit atomic operations.
At least on the Intel architecture, it is possible to do atomic operations on 64-bit entities even when running on a 32-bit processor, as long as it's a Pentium or later. This is courtesy of the cmpxchg8b instruction, which does a compare-and-exchange of the 64-bit value in %edx:%edx with the target. With the lock prefix, it's atomic. This helps a lot when having to avoid locks; it turns out that the RTLinux scheduler has no provision to handle priority inversion, so use of a spinlock in certain situations can lead pretty quickly to deadlock. I had to implement a global 'clock' in 64 bits and this instruction saved my bacon.
To read the value:
__asm__ __volatile__(
" xorl %%eax, %%eax\n"
" xorl %%edx, %%edx\n"
" xorl %%ebx, %%ebx\n"
" xorl %%ecx, %%ecx\n"
"lock; cmpxchg8b %2\n"
" movl %%eax, %0\n"
" movl %%edx, %1"
: "=m" (ll_low(out)), "=m" (ll_high(out))
: "o" (target)
: "memory", "eax", "ebx", "ecx", "edx", "cc");
The "ll_low()" and "ll_high()" macros are Linuxisms to isolate the low and high word, respectively, of a 64-bit entity. They are defined as:
#define ll_low(x) *(((unsigned int*)&(x))+0) #define ll_high(x) *(((unsigned int*)&(x))+1)
To increment the target by a given value:
__asm__ __volatile__(
" movl %0, %%eax\n"
" movl %0+4, %%edx\n"
" movl %1, %%ebx\n"
" xorl %%ecx, %%ecx\n"
"1: addl %%eax, %%ebx\n"
" adcl %%edx, %%ecx\n"
"lock; cmpxchg8b %0\n"
" jnz 1b"
: "+o" (target)
: "m" (incr)
: "memory", "eax", "ebx", "ecx", "edx", "cc");
I loop and reincrement if the compare failed; in that case %edx:%eax is reloaded with the target and I have to retry the operation.
Enjoy. It's a lot faster than using spinlocks.