Re: [PATCH 2.6.3-rc2-mm1] __block_write_full patch

From: Andrew Morton
Date: Fri Feb 13 2004 - 18:51:50 EST


Daniel McNeil <daniel@xxxxxxxx> wrote:
>
> My only concern is that a racing mpage_writepages(WB_SYNC_NONE)
> with a mpage_write_pages(WB_SYNC_ALL) from a filemap_write_and_wait.
> Both could be processing the io_pages list, if the
> mpage_writepages(WB_SYNC_NONE) moves a page that has locked buffers
> back to the dirty_pages list, then when the filemap_write_and_wait()
> calls filemap_fdatawait, it will not wait for the page moved back
> to the dirty list.

Yes. I suspect we simply cannot get this right without insane locking.
We're trying to do something here which the writeback code simply does not
and cannot generally do, namely write and wait upon IO and dirtyings which
are initiated by other processes.

The best way to handle *all* this crap is to remove the address_space page
lists completely and replace all these things with radix tree walks, but I
never got onto that. Sad.

Maybe we could implement some form of per-address_space serialisation which
permts multiple WB_SYNC_NONE writers, but exclusive WB_SYNC_ALL writers.
That's basically an rwsem, but we don't want to block WB_SYNC_NONE
processes if there's a sync in progress.

So WB_SYNC_NONE callers would use down_read_trylock() and WB_SYNC_ALL
callers would use down_write(). That just fixes all this stuff up.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/