Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000500

From: Zlatko Calusic
Date: Thu Dec 27 2012 - 19:33:04 EST


On 28.12.2012 01:24, Sedat Dilek wrote:
On Fri, Dec 28, 2012 at 12:51 AM, Zlatko Calusic
<zlatko.calusic@xxxxxxxx> wrote:
On 28.12.2012 00:42, Sedat Dilek wrote:

On Fri, Dec 28, 2012 at 12:39 AM, Zlatko Calusic
<zlatko.calusic@xxxxxxxx> wrote:

On 28.12.2012 00:30, Sedat Dilek wrote:


Hi Zlatko,

I am not sure if I hit the same problem as described in this thread.

Under heavy load, while building a customized toolchain for the Freetz
router project I got a BUG || NULL pointer derefence || kswapd ||
zone_balanced || pgdat_balanced() etc. (details see my screenshot).

I will try your patch from [1] ***only*** on top of my last
Linux-v3.8-rc1 GIT setup (post-v3.8-rc1 mainline + some net-fixes).


Yes, that's the same bug. It should be fixed with my latest patch, so I'd
appreciate you testing it, to be on the safe side this time. There should
be
no difference if you apply it to anything newer than 3.8-rc1, so go for
it.
Thanks!


Not sure how I can really reproduce this bug as one build worked fine
within my last v3.8-rc1 kernel.
I increased the parallel-make-jobs-number from "4" to "8" to stress a
bit harder.
Just building right now... and will report.

If you have any test-case (script or whatever), please let me/us know.


Unfortunately not, I haven't reproduced it yet on my machines. But it seems
that bug will hit only under heavy memory pressure. When close to OOM, or
possibly with lots of writing to disk. It's also possible that fragmentation
of memory zones could provoke it, that means testing it for a longer time.


I tested successfully by doing simultaneously...
- building Freetz with 8 parallel make-jobs
- building Linux GIT with 1 make-job
- 9 tabs open in firefox
- In one tab I ran YouTube music video
- etc.

I am reading [1] and [2] where another user reports success by reverting this...

commit cda73a10eb3f493871ed39f468db50a65ebeddce
"mm: do not sleep in balance_pgdat if there's no i/o congestion"

BTW, this machine has also 4GiB RAM (Ubuntu/precise AMD64).

Feel free to add a "Reported-by/Tested-by" if you think this is a
positive report.


Thanks for the testing! And keep running it in case something interesting pops up. ;)

No need to revert cda73a10eb because it fixes another bug. And the patch you're now running fixes the new bug I introduced with a combination of my latest 2 patches. Nah, it gets complicated... :)

But, at least I found the culprit and as soon as Linus applies the fix, everything will be hunky dory again, at least on this front. :P

Thanks,
--
Zlatko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/