Re: [patch 1/3] fix illogical behavior in balance_dirty_pages()

From: Miklos Szeredi
Date: Mon Mar 26 2007 - 05:21:33 EST


> > > > It also makes a deadlock possible when one filesystem is writing data
> > > > through another, and the balance_dirty_pages() for the lower
> > > > filesystem is stalling the writeback for the upper filesystem's
> > > > data (*).
> > >
> > > I still don't understand this one. I got lost when belatedly told that
> > > i_mutex had something to do with it.
> >
> > This deadlock only happens, if there's some bottleneck for writing
> > data to the lower filesystem. This bottleneck could be
> >
> > - i_mutex, preventing parallel writes to the same inode
> > - limited number of filesystem threads
> > - limited request queue length in the upper filesystem
> >
> > Imagine it this way: balance_dirty_pages() for the lower filesystem is
> > stalling a write() because dirty pages in the upper filesystem are
> > over the limit. Because there's a bottleneck for writing to the lower
> > filesystem, this is stalling _other_ writes from completing. So
> > there's no progress in writing back pages from the upper filesystem.
>
> You mean that someone is stuck in balance_dirty_pages() against the lower
> fs while holding locks which prevent writes into the upper fs from
> succeeding?
>
> Draw us a picture ;)

Well, not a picture, but a sort of indented call trace:

[some process, which has a fuse file writably mmaped]
write fault on upper filesystem
balance_dirty_pages
loop...
submit write requests

---------------------------------
[fuse loopback fs thread 1]
read request from /dev/fuse
sys_write
mutex_lock(i_mutex)
...
copy data to page cache
balance_dirty_pages
loop ...
submit write requests
write requests completed ...
dirty still over limit ...
... loop forever

[fuse loopback fs thread 2]
read request from /dev/fuse
sys_write
mute_lock(i_mutex) blocks


The lower filesystem (e.g. ext3) has completed the single write
request that was sent to it, and then it's just looping in
balance_dirty_pages. The upper (fuse) filesystem has all the dirty
data (over the threshold), either still dirty or waiting in the
request queue as writeback.

Does this help?

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/