Re: [PATCH 2/3] Fix fsync livelock

From: Mikulas Patocka
Date: Mon Oct 06 2008 - 16:45:53 EST


On Mon, 6 Oct 2008, Arjan van de Ven wrote:

> On Mon, 6 Oct 2008 09:00:14 -0400 (EDT)
> Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
>
> > On Sun, 5 Oct 2008, Arjan van de Ven wrote:
> >
> > > On Sun, 5 Oct 2008 23:30:51 -0400 (EDT
> > > > The point is that many fsync()s may run in parallel and you have
> > > > just one inode and just one chain. And if you add two-word
> > > > list_head to a page, to link it on this list, many developers
> > > > will hate it for increasing its size.
> > >
> > > why to a page?
> > > a list head in the inode and chain up the bios....
> >
> > And if you want to wait for a bio submitted by a different process?
> > There's no way you can find the bio from the page.
>
> the point is that the kernel would always chain it to the inode,
> independent of who or when it is submitted

If you add a list to an inode, you need to protect it with a spinlock. So
you take one more spinlock for any write bio submitted --- a lot of
developers would hate it.

Another problem: how do you want to walk all dirty pages and submit bio
for them?

The act of allocating and submission of bio can block (if you run out of
some mempool) and in this case it wait until some other bio is finished.
During this time, more dirty pages can be created.

Also, if you find a page that is both dirty and under writeback, you need
to wait until a writeback finishes and then initiate another writeback
(because the old writeback may be writing stale data). You again, block,
and more dirty pages can appear.

And if you block and more dirty pages appear, you are prone to the
livelock.

[ In Nick Piggin's patch, it is needed to lock the whole address space,
mark dirty pages in one non-blocking pass and write marked pages again in
a blocking pass --- so that if more dirty pages appear while bios are
submitted, the new pages will be skipped ]

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/