Re: sys_write() racy for multi-threaded append?

From: Eric Dumazet
Date: Thu Mar 08 2007 - 19:15:42 EST


Michael K. Edwards a Ãcrit :
On 3/8/07, Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:
Nothing in the manuals says that write() on same fd should be non racy : In
particular file pos might be undefined. There is a reason pwrite() exists.

Kernel doesnt have to enforce thread safety as standard is quite clear.

I know the standard _allows_ us to crash and burn (well, corrupt
f_pos) when two threads race on an fd, but why would we want to?
Wouldn't it be better to do something at least slightly sane, like add
atomically to f_pos the expected number of number of bytes written,
then do the write, then fix it up (again atomically) if vfs_write
returns an unexpected pos?

Absolutely not. We dont want to slow down kernel 'just in case a fool might want to do crazy things'



Only O_APPEND case is specially handled (and NFS might fail to handle this
case correctly)

Is it? How?
mm/filemap.c

generic_write_checks()

if (file->f_flags & O_APPEND)
*pos = i_size_read(inode);

done while inode is locked.

O_APPEND basically says : Just ignore fpos and always use the 'end of file'

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/