Re: 2.6.24-rc4-mm1

From: Reuben Farrelly
Date: Tue Dec 11 2007 - 09:13:16 EST




On 11/12/2007 8:11 AM, Andrew Morton wrote:
On Tue, 11 Dec 2007 01:48:39 +1100
Reuben Farrelly <reuben-linuxkernel@xxxxxxxx> wrote:


On 5/12/2007 4:17 PM, Andrew Morton wrote:
Temporarily at

http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/

Will appear later at

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/


- Lots of device IDs have been removed from the e1000 driver and moved over
to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E.

- The s390 build is still broken.
I'm seeing this most incredibly unhelpful (to debug) but fortunately
reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem.

The machine boots up perfectly fine and runs good until I load it up.
In this case I can reliably cause this to occur by pulling a 3G ISO across the
GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner):

----------

* Starting local ... [ ok ]


This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01

tornado login: *** buffer overf

-------

Yes - after displaying the 'f' in what I can only guess is the word 'overflow',
the box spontaneously reboots. There is no further console output until it starts to come back up again.

The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage.

I enabled a number of kernel debugging options but then I got no output at all when the machine crashed.

I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC.

Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/


hm. grepping around for "buffer overflow" doesn't turn up anything except in
drivers which you won't be using on that machine.

I'd be suspecting networking, obviously. If you're feeling keen could you please
grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch
and see if the bug is still present?

No - seems to be fine with just origin.patch and git-net.patch.

Just for good measure I then reverted git-net.patch and applied git-netdev-all.patch instead, and still wasn't able to trigger the reboot or console message, no matter how hard I tried.

I guess for now I'll sit on it, and if it appears in the next -mm it'll probably annoy me enough and inspire me to dig deeper (or, "guess" deeper, given the lack of direction as to where to even begin).

Reuben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/