Re: parallel writes to the same file, 2.3.7

Stephen C. Tweedie (sct@redhat.com)
Wed, 23 Jun 1999 13:08:02 +0100 (BST)


Hi,

On 22 Jun 1999 16:44:47 GMT, torvalds@transmeta.com (Linus Torvalds)
said:

> Nope. I removed the semaphore, because it protected the wrong thing.

> The semaphore either has to protect _everything_ like it did in 2.2.x,
> or it has to protect the file length changes wrt buffer allocation.

Nope, we need to protect more than that. We have to protect the mapping
tree to prevent us from unmapping blocks which already have IO in
progress.

The problem we have right now in 2.3, unless I've missed something, is
that truncate is not synchronised at all with outstanding reads and
writes. Never mind allocations: think about what happens when you
submit a hundred writes to the inside of a file and another thread comes
along and truncates it. You can end up having writes outstanding to a
block which has been deleted on disk (and which may have been
reallocated).

You cannot even rely on the vmtruncate() locking in this case, because
there is a window in do_truncate() between the vmtruncate and the call
to the fs-specific truncate code. I think we can end up with:

Process 1 Process 2

sys_truncate():
down(sem);
adjust i_size;
vmtruncate();
ext2_truncate():
block for IO

< Mapping information is still present! >

sys_write(): to an already-allocated
buffer about to be truncated.
generic_file_write():
block_write_partial_page():
ext2_getblk_block();

continue ext2_truncate():
erase mapping information for write IO already submitted.

By this point we can end up reallocating the outstanding write blocks to
another file. Similar races in read() ought to be OK, as they are
protected against by the existing i_size assignment.

I _think_ that adding the inode semaphore around any writes beyond
i_size ought to eliminate this sort of problem.

On Tue, 22 Jun 1999 18:22:50 +0200 (CEST), Ingo Molnar
<mingo@chiara.csoma.elte.hu> said:

> check out block_*_write(), we get the inode semaphore when we allocate new
> blocks. So most races should be taken care of.

I can only see us taking the semaphore in truncate and fsync.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/