Re: Boot failure with ext2 and initrds
From: Mingming Cao
Date: Mon Nov 20 2006 - 20:48:09 EST
On Mon, 2006-11-20 at 16:19 +0000, Hugh Dickins wrote:
> On Thu, 16 Nov 2006, Andrew Morton wrote:
> > On Thu, 16 Nov 2006 12:15:16 -0800
> > Mingming Cao <cmm@xxxxxxxxxx> wrote:
> > > On Thu, 2006-11-16 at 01:13 -0800, Andrew Morton wrote:
> > > >
> > > > I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has
> > > > gone infinite. I don't see why, but more staring is needed.
> > >
> > > The loop should not go forever, it will stops when there is no window
> > > with free bit to reserve in the given block group.
> > It seems to have done so in Hugh's testing, but there's some question there
> > now. Although I didn't check to see if there's a significant difference
> > between Hugh's patch and mine.
> After four days of running, the EM64T has at last reproduced the same
> hang as it did in an hour before: stuck in ext2_try_to_allocate_with_rsv,
> repeatedly ext2_rsv_window_add, ext2_try_to_allocate, rsv_window_remove
> (perhaps not in that order).
> But somehow I've screwed up, and before I'd extracted any new info,
> kdb was spinning on its own kdb_printf_lock, and then the box completely
> frozen: nothing for it but to reboot.
> Well: at least this resolves the doubt I raised earlier, whether
> I'd been testing with the right ext2 patch. This time was definitely
> with your patch not my two-liner: your original patch from Tuesday 14 Nov,
> Various other patches have come into -mm (one gone) since then, but I've
> been running just with that: so far as I can see, none of the later ones
> should be important in my case - the filesystem is too small for the
> difference between int and ext2_fsblk_t to become important, and
> neither "ld" nor "as" (the writer this time) does MAP_SHARED pageout.
So there is only one writer at the moment the hang was happening?
hmm, is the filesystem relatively all being used or reserved, i.e, the
free bits are all being reserved? There is one extreme case that may
cause starvation. If filesystem free blocks are all being reserved, when
a new writer need a free block, it has to go through the entire
filesystems, try to reserve a space, which will repeatly calling
rsv_window_add and rsv_window_remove, since. Finally give up and fall
back to allocation without reservation. But this is all theory, not sure
fits your case here.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/