Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849
From: Petr Vandrovec
Date: Thu Sep 22 2005 - 19:26:24 EST
Christoph Lameter wrote:
On Thu, 22 Sep 2005, Andrew Morton wrote:
Great, thanks. Christoph, was that patch the final official version?
This should deal with the node ownership issue. So yes.
I still have some open question on how pages ended up on the wrong node.
This should only happen if a zone / node has run out of memory. If pages
ended up on the wrong node without that then there may be a different
issue still to be fixed.
Maybe Petr can give us some more details on when the problem occurs?
Problem seems to happen immediately, and just first run of cache_reap
(2 seconds after eventd initializes if I understand it correctly) already
finds problem.
But I'm confused. I've just added code which is supposed to verify all
additions to the cache entry[] (http://platan.vc.cvut.cz/verify-all-entry-add.diff)
on the top of Christoph patch to catch one which later causes problem in cache_reap,
and it logs nothing at the time crash was happening :-( Only incident it logs is
"while (batchcount > 0)" loop in cache_alloc_refill, saying that
objp ffff81007ffd9430 belonging to the slab ffff81007ffd9000 which belongs
to node 1 was added to array_cache belonging to node 0 (called from
ffffffff8016e4a9) (mm/slab.c ~ line 2430)
... cache avc_node
This repeats couple of times, for avc_node, mnt_cache, proc_inode_cache
and bdev_cache. Nothing else.
So I've reverted your fix, and still I did not catch offender, so I'm probably
missing some place which populates array_cache entry[] :-(
Only if after I added logging to free_block() I was able to find that offender is
proc_inode_cache. But I have no idea how this object appeared in the incorrect
node cache...
Petr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/