memory barriers in flock (Re: [PATCH v3] locks: close potential race between setlease and open)

From: David Howells
Date: Thu Aug 15 2013 - 16:44:40 EST

Bruce Fields <bfields@xxxxxxxxxxxx> wrote:

(Adding Paul McKenney who's good at this stuff)

> > v2:
> > - fix potential double-free of lease if second check finds conflict
> > - add smp_mb's to ensure that other CPUs see i_flock changes
> >
> > v3:
> > - remove smp_mb calls. Partial ordering is unlikely to help here.
> Forgive me here, I still don't understand. So to simplify massively,
> the situation looks like:
> setlease open
> ------------ ------
> atomic_read atomic_inc
> write i_flock read i_flock
> atomic_read

Are the three atomic ops reading the same value? If so, you can have smp_mb()
calls here:

atomic_read atomic_inc
write i_flock read i_flock

I *think* that memory accesses in one place need to be reverse-ordered wrt to
those in the other place, so:

atomic_read atomic_inc
smp_mb() smp_mb()
write i_flock read i_flock

doesn't achieve anything.

> And we want to be sure that either the setlease caller sees the result
> of the atomic_inc, or the opener sees the result of the write to
> i_flock.
> As an example, suppose the above steps happen in the order:
> atomic_read [A]
> write i_flock [B]
> atomic_read [C]
> atomic_inc [X]
> read i_flock [Y]

(I've labelled the operations for convenience)

> How do we know that the read of i_flock [Y] at the last step reflects the
> latest value of i_flock? For example, couldn't the write still be stuck in
> first CPU's cache?

Putting in memory barriers doesn't help here. If A, B and C are all performed
and committed to memory by the time X happens, then Y will see B, but C will
not see X.

The thing to remember is that smp_mb() is not a commit operation: it doesn't
cause a modification to be committed to memory. The reason you use it is to
make sure the CPU actually does preceding memory ops - eg. makes the
modification in question - before it goes and does any following memory ops.

Linux requires the system to be cache-coherent, so if the write is actually
performed by a CPU then the result will be obtained by any other CPU, no
matter whether it's still lingering in the first CPU's caches or whether it's
been passed on.


However, I could be wrong. Memory barriers are mind-bending to try and think
through, especially when it's the operations being ordered are R-W vs R-W
rather than having W-W on at least one side.

Hopefully Paul will be able to chime in

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at