Re: Question about cacheline bounching with percpu-rwsem and rcu-sync

From: Joel Fernandes
Date: Sat Jun 08 2019 - 20:29:24 EST

Next message: Finn Thain: "[PATCH v2 4/7] scsi: mac_scsi: Increase PIO/PDMA transfer length threshold"
Previous message: Nicolas Pitre: "Re: [PATCH v3] vt: Fix a missing-check bug in con_init()"
Next in thread: Paul E. McKenney: "Re: Question about cacheline bounching with percpu-rwsem and rcu-sync"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, May 31, 2019 at 10:43 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
[snip]
> >
> > Either way, it would be good for you to just try it. Create a kernel
> > module or similar than hammers on percpu_down_read() and percpu_up_read(),
> > and empirically check the scalability on a largish system. Then compare
> > this to down_read() and up_read()
>
> Will do! thanks.

I created a test for this and the results are quite amazing just
stressed read lock/unlock for rwsem vs percpu-rwsem.
The test is conducted on a dual socket Intel x86_64 machine with 14
cores each socket.

Test runs 10,000,000 loops of rwsem vs percpu-rwsem:
https://github.com/joelagnel/linux-kernel/commit/8fe968116bd887592301179a53b7b3200db84424

Graphs/Results here:
https://docs.google.com/spreadsheets/d/1cbVLNK8tzTZNTr-EDGDC0T0cnFCdFK3wg2Foj5-Ll9s/edit?usp=sharing

The completion time of the test goes up somewhat exponentially with
the number of threads, for the rwsem case, where as for percpu-rwsem
it is the same. I could add this data to some of the documentation as
well.

Thanks!

- Joel

Next message: Finn Thain: "[PATCH v2 4/7] scsi: mac_scsi: Increase PIO/PDMA transfer length threshold"
Previous message: Nicolas Pitre: "Re: [PATCH v3] vt: Fix a missing-check bug in con_init()"
Next in thread: Paul E. McKenney: "Re: Question about cacheline bounching with percpu-rwsem and rcu-sync"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]