Re: sys_write() racy for multi-threaded append?

From: Eric Dumazet
Date: Fri Mar 09 2007 - 00:53:27 EST


Michael K. Edwards a Ãcrit :
On 3/8/07, Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:
Absolutely not. We dont want to slow down kernel 'just in case a fool might
want to do crazy things'

Actually, I think it would make the kernel (negligibly) faster to bump
f_pos before the vfs_write() call. Unless fget_light sets fput_needed
or the write doesn't complete cleanly, you won't have to touch the
file table entry again after vfs_write() returns. You can adjust
vfs_write to grab f_dentry out of the file before going into
do_sync_write. do_sync_write is done with the struct file before it
goes into the aio_write() loop. Result: you probably save at least an
L1 cache miss, unless the aio_write loop is so frugal with L1 cache
that it doesn't manage to evict the struct file.

Patch to follow.

Dont even try, you *cannot* do that, without breaking the standards, or without a performance drop.

The only safe way would be to lock the file during the whole read()/write() syscall, and we dont want this (this would be more expensive than current)
Dont forget 'file' may be some sockets/tty/whatever, not a regular file.

Standards are saying :

If an error occurs, file pointer remains unchanged.

You cannot know for sure how many bytes will be written, since write() can returns a count that is different than buflen.

So you cannot update fpos before calling vfs_write()

About your L1 'miss', dont forget that multi-threaded apps are going to atomic_dec_and_test(&file->f_count) anyway when fput() is done at the end of syscall. And you were concerned about multi-threaded apps, didnt you ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/