Re: Regression - locking (all from 2.6.28)

From: Dave Hansen
Date: Fri Mar 06 2009 - 14:20:05 EST


On Fri, 2009-03-06 at 18:00 +0000, Catalin Marinas wrote:
> > I think you should be more worried about consistency rather than missing
> > entries. Take these two lines of code:
> >
> > start_pfn = node->node_start_pfn;
> > /* hotplug occurs here */
> > end_pfn = start_pfn + node->node_spanned_pages;
> >
> > What if someone comes in and adds memory to the node, at the beginning
> > of the node, after you have calculated start_pfn? Try to think of what
> > value you'll get for end_pfn and whether it is consistent and was *ever*
> > valid at all. Would that oops the kernel?
>
> I assume pfn_valid() should handle this and kmemleak wouldn't scan the
> page, unless we need locks around pfn_valid as well but I haven't seen
> any used in the kernel.

You assume incorrectly. :(

Take my above example, and assume that you have two nodes which are
right next to each other. You might run over the end of one node and
into the next one. Your pages will be pfn_valid() but you will be on
the wrong node.

Please take a look at those locks that I mentioned. Notice that they
are lock the pgdat *span*, not the validity of pages inside the pgdat.
Your code deals with what pages the pgdats *span* and thus needs that
lock. Notice that my example also had to do with those two lines of
code incorrectly guessing the pgdat's *span*.

We recently went to some pain to make sure that the software suspend
code (which walks pgdat ranges as well) worked with memory hotplug.
There really isn't that much code around that actually cares at runtime
about which physical areas a particular node or zone spans. Yours is a
rarity and will require some caution.

You could probably also use the memory hotplug mutex found here:

https://lists.linux-foundation.org/pipermail/linux-pm/2008-November/018884.html

But I'm not sure where those patches have gone. Hmmm. Pavel?

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/