There are CPU instructions which ensure exclusive access to particular memory regions, but that's lower level than the OS scheduler. Just use one of the OS-provided locks, such as a mutex. Unfair locks can also help improve throughput, but at the cost of latency.
Linus Torvalds on Spinlocks, Unfair locks, etc
So what's the fix for this?
Use a lock where you tell the system that you're waiting for the lock, and where the unlocking thread will let you know when it's done, so that the scheduler can actually work with you, instead of (randomly) working against you.
Notice, how when the author uses an actual std::mutex, things just work fairly well, and regardless of scheduler. Because now you're doing what you're supposed to do. Yeah, the timing values might still be off - bad luck is bad luck - but at least now the scheduler is aware that you're "spinning" on a lock.
Or, if you really want to use use spinlocks (hint: you don't), make sure that while you hold the lock, you're not getting scheduled away. You need to use a realtime scheduler for that (or be the kernel: inside the kernel spinlocks are fine, because the kernel itself can say "hey, I'm doing a spinlock, you can't schedule me right now").
But if you use a realtime scheduler, you need to be aware of the other implications of that. There are many, and some of them are deadly. I would suggest strongly against trying. [...]
Because otherwise you're going to at some time be scheduled away while you're holding the lock (perhaps after you've done all the work, and you're just about to release it), and everybody else will be blocking on your incorrect locking while you're scheduled away and not making any progress. All spinning on CPU's.
Really, it's that simple.
This has absolutely nothing to do with cache coherence latencies or anything like that. It has everything to do with badly implemented locking.
I repeat: do not use spinlocks in user space, unless you actually know what you're doing . And be aware that the likelihood that you know what you are doing is basically nil.
There's a very real reason why you need to use sleeping locks (like pthread_mutex etc).
[...]Because you should never ever think that you're clever enough to write your own locking routines.. Because the likelihood is that you aren't (and by that "you" I very much include myself - we've tweaked all the in-kernel locking over decades , and gone through the simple test-and-set to ticket locks to cacheline-efficient queuing locks, and even people who know what they are doing tend to get it wrong several times).
There's a reason why you can find decades of academic papers on locking. Really. It's hard.