Re: [PATCH 16/19] mm: numa: Add pte updates, hinting and migrationstats

From: Mel Gorman
Date: Wed Nov 07 2012 - 05:57:35 EST


On Tue, Nov 06, 2012 at 02:55:06PM -0500, Rik van Riel wrote:
> On 11/06/2012 04:14 AM, Mel Gorman wrote:
> >It is tricky to quantify the basic cost of automatic NUMA placement in a
> >meaningful manner. This patch adds some vmstats that can be used as part
> >of a basic costing model.
> >
> >u = basic unit = sizeof(void *)
> >Ca = cost of struct page access = sizeof(struct page) / u
> >Cpte = Cost PTE access = Ca
> >Cupdate = Cost PTE update = (2 * Cpte) + (2 * Wlock)
> > where Cpte is incurred twice for a read and a write and Wlock
> > is a constant representing the cost of taking or releasing a
> > lock
> >Cnumahint = Cost of a minor page fault = some high constant e.g. 1000
> >Cpagerw = Cost to read or write a full page = Ca + PAGE_SIZE/u
> >Ci = Cost of page isolation = Ca + Wi
> > where Wi is a constant that should reflect the approximate cost
> > of the locking operation
> >Cpagecopy = Cpagerw + (Cpagerw * Wnuma) + Ci + (Ci * Wnuma)
> > where Wnuma is the approximate NUMA factor. 1 is local. 1.2
> > would imply that remote accesses are 20% more expensive
> >
> >Balancing cost = Cpte * numa_pte_updates +
> > Cnumahint * numa_hint_faults +
> > Ci * numa_pages_migrated +
> > Cpagecopy * numa_pages_migrated
> >
> >Note that numa_pages_migrated is used as a measure of how many pages
> >were isolated even though it would miss pages that failed to migrate. A
> >vmstat counter could have been added for it but the isolation cost is
> >pretty marginal in comparison to the overall cost so it seemed overkill.
> >
> >The ideal way to measure automatic placement benefit would be to count
> >the number of remote accesses versus local accesses and do something like
> >
> > benefit = (remote_accesses_before - remove_access_after) * Wnuma
> >
> >but the information is not readily available. As a workload converges, the
> >expection would be that the number of remote numa hints would reduce to 0.
> >
> > convergence = numa_hint_faults_local / numa_hint_faults
> > where this is measured for the last N number of
> > numa hints recorded. When the workload is fully
> > converged the value is 1.
> >
> >This can measure if the placement policy is converging and how fast it is
> >doing it.
> >
> >Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
>
> I'm skipping the ACKing of the policy patches, which
> appear to be meant to be placeholders for a "real"
> policy.

I do expect the MORON policy to disappear or at least change so much it
is not recognisable.

> However, you have a few more mechanism patches
> left in the series, which would be required regardless
> of what policy gets merged, so ...
>

Initially, I had the slow WSS sampling at the end because superficially
they could be considered an optimisation and I wanted to avoid sneaking
optimisations in. On reflection, the slow WSS sampling is pretty fundamental
and I've moved it earlier in the series like so;

mm: mempolicy: Add MPOL_MF_LAZY mm: mempolicy: Use _PAGE_NUMA to migrate pages
mm: numa: Add fault driven placement and migration
mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate
mm: sched: numa: Implement slow start for working set sampling
mm: numa: Add pte updates, hinting and migration stats
mm: numa: Migrate on reference policy

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/