Re: i_size still not SMP safe.

Alexander Viro (viro@math.psu.edu)
Thu, 5 Aug 1999 02:30:20 -0400 (EDT)


On Thu, 5 Aug 1999, Manfred Spraul wrote:

> Alexander Viro wrote:
> > > Description (everything for ext2):
> > > sys_write() calls generic_file_write() without acquiring the big kernel
> > > lock, and generic_file_write() uses f_pos and i_size without any locks.
> > > Note that O_APPEND is broken as well, and this might be the larger problem.
> >
> > I'm moving the modification of f_pos and i_size inside of the per-page
> > lock in FAT patch. With FAT the things get addition trickyness, since we
> > don't have holes (arrgh...) It is not enough for complete solution,
> > though.
>
> I thought about that, it could become ugly: e.g., sys_llseek() and
> truncate would have to lock a page in the page-cache.
>
> And you should also check the thread "2.3 SMP overlapping writes and
> NFS".
> Basically, NFS v2-writes should be atomic. Even multi-page writes.

It sucks. Forget about NFS-exporting FAT, then.

> I think a list of all collisions will be the best option:
> * parallel writes and reads to different areas of the file will
> be possible.

They are.

> * parallel readdir()-calls should be possible, but I did not check this.

No, they cannot. Many filesystems rely on that *not* happening. Do it and
you broke a lot of namespace-related code. IOW, forget it. Please, leave
directories out of it. That kind of mess is the last thing we need right
now - tons of new races over the whole namespace code.

> * it's not that complicated: e.g. a truncate is identical to a write:
> it needs exclusive access for the byte range (new EOF, current EOF).

You *do* know how badly POSIX locks implementation sucks, right? Relying
on it will kill any performance.

> You don't need any new sync objects: only a normal spinlock for
> additions
> and removals from the collision list, and a wait queue for blocked
> threads.
> The collision list is similar to an extended semaphore: the "dec
> sem->count"
> is replaced with a check of all pending operations, so it's a very
> flexible and "fine grained" lock.

Too fine grained. You are adding the unneeded overhead.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/