Re: [PATCH v2 RESEND] x86: optimize memcpy_flushcache

From: Ingo Molnar
Date: Thu Jun 21 2018 - 21:30:59 EST



* Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:

> On Thu, 21 Jun 2018, Ingo Molnar wrote:
>
> >
> > * Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> >
> > > From: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> > > Subject: [PATCH v2] x86: optimize memcpy_flushcache
> > >
> > > In the context of constant short length stores to persistent memory,
> > > memcpy_flushcache suffers from a 2% performance degradation compared to
> > > explicitly using the "movnti" instruction.
> > >
> > > Optimize 4, 8, and 16 byte memcpy_flushcache calls to explicitly use the
> > > movnti instruction with inline assembler.
> >
> > Linus requested asm optimizations to include actual benchmarks, so it would be
> > nice to describe how this was tested, on what hardware, and what the before/after
> > numbers are.
> >
> > Thanks,
> >
> > Ingo
>
> It was tested on 4-core skylake machine with persistent memory being
> emulated using the memmap kernel option. The dm-writecache target used the
> emulated persistent memory as a cache and sata SSD as a backing device.
> The patch results in 2% improved throughput when writing data using dd.
>
> I don't have access to the machine anymore.

I think this information is enough, but do we know how well memmap emulation
represents true persistent memory speed and cache management characteristics?
It might be representative - but I don't know for sure, nor probably most
readers of the changelog.

So could you please put all this into an updated changelog, and also add a short
description that outlines exactly which codepaths end up using this method in a
typical persistent memory setup? All filesystem ops - or only reads, etc?

Thanks,

Ingo