Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

From: Petr Vandrovec
Date: Thu Sep 22 2005 - 19:26:24 EST

Next message: Roman Zippel: "Re: [ANNOUNCE] ktimers subsystem"
Previous message: Alejandro Bonilla Beeche: "Re: R52 hdaps support?"
In reply to: Christoph Lameter: "Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849"
Next in thread: Petr Vandrovec: "Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Christoph Lameter wrote:

On Thu, 22 Sep 2005, Andrew Morton wrote:

Great, thanks. Christoph, was that patch the final official version?

This should deal with the node ownership issue. So yes.

I still have some open question on how pages ended up on the wrong node.
This should only happen if a zone / node has run out of memory. If pages ended up on the wrong node without that then there may be a different issue still to be fixed.

Maybe Petr can give us some more details on when the problem occurs?

Problem seems to happen immediately, and just first run of cache_reap
(2 seconds after eventd initializes if I understand it correctly) already
finds problem.

But I'm confused. I've just added code which is supposed to verify all
additions to the cache entry[] (http://platan.vc.cvut.cz/verify-all-entry-add.diff)
on the top of Christoph patch to catch one which later causes problem in cache_reap,
and it logs nothing at the time crash was happening :-( Only incident it logs is
"while (batchcount > 0)" loop in cache_alloc_refill, saying that

objp ffff81007ffd9430 belonging to the slab ffff81007ffd9000 which belongs
to node 1 was added to array_cache belonging to node 0 (called from
ffffffff8016e4a9) (mm/slab.c ~ line 2430)
... cache avc_node

This repeats couple of times, for avc_node, mnt_cache, proc_inode_cache
and bdev_cache. Nothing else.

So I've reverted your fix, and still I did not catch offender, so I'm probably
missing some place which populates array_cache entry[] :-(

Only if after I added logging to free_block() I was able to find that offender is
proc_inode_cache. But I have no idea how this object appeared in the incorrect
node cache...
Petr

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Roman Zippel: "Re: [ANNOUNCE] ktimers subsystem"
Previous message: Alejandro Bonilla Beeche: "Re: R52 hdaps support?"
In reply to: Christoph Lameter: "Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849"
Next in thread: Petr Vandrovec: "Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]