VM deadlock in 2.4.11-pre4 (yes, it's me again :-)

From: Stephan von Krawczynski (skraw@ithnet.com)
Date: Sat Oct 06 2001 - 05:08:32 EST


Hello,

the time has come again for my monthly stress-testing of new kernels :-)
I installed 2.4.11-pre4 and ran it for a day. It was really strange for me to
see _no_ alloc failures at all, and that's why I gave it a closer look (alan
fooled me once by deleting the printk, which made the problem "disappear" too
:-). And here it is (from page_alloc.c):

 rebalance:
        page = balance_classzone(classzone, gfp_mask, order, &freed);
        if (page)
                return page;

        zone = zonelist->zones;
        if (likely(freed)) {
                for (;;) {
                        zone_t *z = *(zone++);
                        if (!z)
                                break;

                        if (zone_free_pages(z, order) > z->pages_min) {
                                page = rmqueue(z, order);
                                if (page)
                                        return page;
                        }
                }
                goto rebalance;
        } else {
                /*
                 * Check that no other task is been killed meanwhile,
                 * in such a case we can succeed the allocation.
                 */
                for (;;) {
                        zone_t *z = *(zone++);
                        if (!z)
                                break;

                        if (zone_free_pages(z, order) > z->pages_min) {
                                page = rmqueue(z, order);
                                if (page)
                                        return page;
                        }
                }

                goto rebalance;
        }

        printk(KERN_NOTICE "%s __alloc_pages: %u-order allocation failed
(gfp=0x%x/%i) from %p\n",
               current->comm, order, gfp_mask, !!(current->flags &
PF_MEMALLOC), __builtin_return_address(0));
/* my stuff :-) */
        if (order==0)
                show_trace(NULL);
        return NULL;
}

As you can see I patched some more info output on certain alloc-failures. But
unfortunately I did not read the lines above - up to now. As you can see the
printk cannot be reached at all, because both if-cases jump to rebalance - not
matter how comes.
When I wrote the first version of this posting, I wrote: this looks like a
possible deadlock to me, because it cycles forever when no pages are found. I
proved myself right this time, because when I tried to send the mail and
started a CD burn at the background, the host froze. As you may remember the CD
burns always gave me alloc-failures during startup in earlier kernel versions.
So it is pretty obvious I reached the deadlock this time.
Another thing I would like to kindly ask: what is the difference in the two if
branches? As I cannot read really well (already proven in another thread :-), I
am very willing to accept any explanation on this code ;-)
If someone else has asked the whole thing before: shoot me.

Regards,
Stephan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Oct 07 2001 - 21:00:41 EST