Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

From: Dave Chinner
Date: Wed Mar 18 2015 - 18:24:10 EST


On Wed, Mar 18, 2015 at 10:31:28AM -0700, Linus Torvalds wrote:
> On Wed, Mar 18, 2015 at 9:08 AM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > So why am I wrong? Why is testing for dirty not the same as testing
> > for writable?
> >
> > I can see a few cases:
> >
> > - your load has lots of writable (but not written-to) shared memory
>
> Hmm. I tried to look at the xfsprog sources, and I don't see any
> MAP_SHARED activity. It looks like it's just using pread64/pwrite64,
> and the only MAP_SHARED is for the xfsio mmap test thing, not for
> xfsrepair.
>
> So I don't see any shared mappings, but I don't know the code-base.

Right - all the mmap activity in the xfs_repair test is coming from
memory allocation through glibc - we don't use mmap() directly
anywhere in xfs_repair. FWIW, all the IO into these pages that are
allocated is being done via direct IO, if that makes any
difference...

> > - something completely different that I am entirely missing
>
> So I think there's something I'm missing. For non-shared mappings, I
> still have the idea that pte_dirty should be the same as pte_write.
> And yet, your testing of 3.19 shows that it's a big difference.
> There's clearly something I'm completely missing.

This level of pte interactions is beyond my level of knowledge, so
I'm afraid at this point I'm not going to be much help other than to
test patches and report the result.

FWIW, here's the distribution of the hash table we are iterating
over. There are a lot of search misses, which means we are doing a
lot of pointer chasing, but the distribution is centred directly
around the goal of 8 entries per chain and there is no long tail:

libxfs_bcache: 0x67e110
Max supported entries = 808584
Max utilized entries = 808584
Active entries = 808583
Hash table size = 101073
Hits = 9789987
Misses = 8224234
Hit ratio = 54.35
MRU 0 entries = 4667 ( 0%)
MRU 1 entries = 0 ( 0%)
MRU 2 entries = 4 ( 0%)
MRU 3 entries = 797447 ( 98%)
MRU 4 entries = 653 ( 0%)
MRU 5 entries = 0 ( 0%)
MRU 6 entries = 2755 ( 0%)
MRU 7 entries = 1518 ( 0%)
MRU 8 entries = 1518 ( 0%)
MRU 9 entries = 0 ( 0%)
MRU 10 entries = 21 ( 0%)
MRU 11 entries = 0 ( 0%)
MRU 12 entries = 0 ( 0%)
MRU 13 entries = 0 ( 0%)
MRU 14 entries = 0 ( 0%)
MRU 15 entries = 0 ( 0%)
Hash buckets with 0 entries 30 ( 0%)
Hash buckets with 1 entries 241 ( 0%)
Hash buckets with 2 entries 1019 ( 0%)
Hash buckets with 3 entries 2787 ( 1%)
Hash buckets with 4 entries 5838 ( 2%)
Hash buckets with 5 entries 9144 ( 5%)
Hash buckets with 6 entries 12165 ( 9%)
Hash buckets with 7 entries 14194 ( 12%)
Hash buckets with 8 entries 14387 ( 14%)
Hash buckets with 9 entries 12742 ( 14%)
Hash buckets with 10 entries 10253 ( 12%)
Hash buckets with 11 entries 7308 ( 9%)
Hash buckets with 12 entries 4872 ( 7%)
Hash buckets with 13 entries 2869 ( 4%)
Hash buckets with 14 entries 1578 ( 2%)
Hash buckets with 15 entries 894 ( 1%)
Hash buckets with 16 entries 430 ( 0%)
Hash buckets with 17 entries 188 ( 0%)
Hash buckets with 18 entries 88 ( 0%)
Hash buckets with 19 entries 24 ( 0%)
Hash buckets with 20 entries 11 ( 0%)
Hash buckets with 21 entries 10 ( 0%)
Hash buckets with 22 entries 1 ( 0%)


Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/