Re: The performance and behaviour of the anti-fragmentation relatedpatches

From: Linus Torvalds
Date: Fri Mar 02 2007 - 14:05:05 EST




On Fri, 2 Mar 2007, Mark Gross wrote:
>
> I think there will be more than just 2 dims per cpu socket on systems
> that care about this type of capability.

I agree. I think you'll have a nice mix of 2 and 4, although not likely a
lot more. You want to have independent channels, and then within a channel
you want to have as close to point-to-point as possible.

But the reason that I think you're better off looking at a "node level" is
that

(a) describing the DIMM setup is a total disaster. The interleaving is
part of it, but even in the absense of interleaving, we have so far
seen that describing DIMM mapping simply isn't a realistic thing to
be widely deplyed, judging by the fact that we cannot even get a
first-order approximate mapping for the ECC error events.

Going node-level means that we just piggy-back on the existing node
mapping, which is a lot more likely to actually be correct and
available (ie you may not know which bank is bank0 and how the
interleaving works, but you usually *do* know which bank is connected
to which CPU package)

(Btw, I shouldn't have used the word "die", since it's really about
package - Intel obviously has a penchant for putting two dies per
package)

(b) especially if you can actually shut down the memory, going node-wide
may mean that you can shut down the CPU's too (ie per-package sleep).
I bet the people who care enough to care about DIMM's would want to
have that *anyway*, so tying them together simplifies the problem.

> BTW I hope we aren't talking past each other, there are low power states
> where the ram contents are persevered.

Yes. They are almost as hard to handle, but the advantage is that if we
get things wrong, it can still work most of the time (ie we don't have to
migrate everything off, we just need to try to migrate the stuff that gets
*used* off a DIMM, and hardware will hopefully end up quiescing the right
memory controller channel totally automatically, without us having to know
the exact mapping or even having to 100% always get it 100% right).

With FBDIMM in particular, I guess the biggest power cost isn't actually
the DRAM content, but just the controllers.

Of course, I wonder how much actual point there is to FBDIMM's once you
have on-die memory controllers and thus the reason for deep queueing is
basically gone (since you'd spread out the memory rather than having it
behind a few central controllers).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/