Re: [PATCH] writeback: Don't wait for completion inwriteback_inodes_sb_nr

From: Christoph Hellwig
Date: Wed Jun 29 2011 - 14:00:20 EST


On Wed, Jun 29, 2011 at 10:26:33AM -0700, Curt Wohlgemuth wrote:
> The semantics did change between 2.6.34 and 2.6.35 though. When the
> work item queue was introduced in 2.6.32, the semantics changed from
> what you describe above to what's present through 2.6.34:
> writeback_inodes_sb() would enqueue a work item, and return. Your
> commit 83ba7b07 ("writeback: simplify the write back thread queue")
> added the wait_for_completion() call, putting the semantics back to
> where they were pre-2.6.32.

Yes. The kernels inbetween had that nasty writeback vs umount races
that we could trigger quite often.

> Though one additional change between the old way (pre-2.6.32) and
> today: with the old kernel, the pdflush thread would operate
> concurrently with the first (and second?) sync path through writeback.
> Today of course, they're *all* serialized. So really a call to
> sys_sync() will enqueue 3 work items -- one from
> wakeup_flusher_threads(), one from writeback_inodes_sb(), and one from
> sync_inodes_sb().

Yes. And having WB_SYNC_NONE items from both wakeup_flusher_threads vs
writeback_inodes_sb is plain stupid. The initial conversion to the
new sync_filesystem scheme had removed the wakeup_flusher_threads
call, but that caused a huge regression in some benchmark.

As mentioned before Wu was working on some code to introduce tagging
so that the WB_SYNC_ALL call won't start writing pages dirtied after
the sync call, which should help with your issue. Although to
completely solve it we really need to get down to just two passes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/