Re: [patch V2 00/20] timer: Refactor the timer wheel

From: Paul E. McKenney
Date: Mon Jun 20 2016 - 11:37:44 EST


On Fri, Jun 17, 2016 at 01:26:28PM -0000, Thomas Gleixner wrote:
> This is the second version of the timer wheel rework series. The first series
> can be found here:
>
> http://lkml.kernel.org/r/20160613070440.950649741@xxxxxxxxxxxxx
>
> The series is also available in git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers

Ran some longer rcutorture tests, and the scripting complained about
hangs. This turned out to be due to the 12.5% uncertainty, so I fixed
this by switching the rcutorture stop-test timer to hrtimers. Things are
now working as well as before, with the exception of SRCU, for which I
am getting lots of grace-period stall complaints. This came as a bit
of a surprise. Anyway, I will be reviewing SRCU for timing dependencies.

Thanx, Paul

> Changes vs. V1:
>
> - Addressed the review comments of V1
>
> - Fixed the fallout in tty/metag (noticed by Arjan)
> - Renamed the hlist helper (noticed by Paolo/George)
> - Used the proper mask in get_timer_base() (noticed by Richard)
> - Fixed the inverse state check in internal_add_timer() (noticed by Richard)
> - Simplified the macro maze, removed wrapper (noticed by George)
> - Reordered data retrieval in run_timer() (noticed by George)
>
> - Removed cascading completely
>
> We have a hard cutoff of expiry times at the capacity of the last wheel
> level now. Timers which insist on timeouts longer than that, i.e. ~6days,
> will expire at the cutoff, i.e. ~6 days. From our data gathering the
> largest timeouts are 5 days (networking contrack), which are well in the
> capacity.
>
> To achieve this capacity with HZ=1000 without increasing the storage size
> by another level, we reduced the granularity of the first wheel level from
> 1ms to 4ms. According to our data, there is no user which relies on that
> 1ms granularity and 99% of those timers are canceled before expiry.
>
> As a side effect there is the benefit of better batching in the first level
> which helps networking to avoid rearming timers in the hotpath.
>
> We gathered more data about performance and batching. Compared to mainline the
> following changes have been observed:
>
> - The bad outliers in mainline when the timer wheel needs to be forwarded
> after a long idle sleep are completely gone.
>
> - The total cpu time used for timer softirq processing is significantly
> reduced. Depending on the HZ setting and workload this ranges from factor
> 2 to 6.
>
> - The average invocation period of the timer softirq on an idle system
> increases significantly. Depending on the HZ settings and workload this
> ranges from factor 1.5 to 5. That means that the residency in deep
> c-states should be improved. Have not yet have time to verify this with
> the power tools.
>
> Thanks,
>
> tglx
>
> ---
> arch/x86/kernel/apic/x2apic_uv_x.c | 4
> arch/x86/kernel/cpu/mcheck/mce.c | 4
> block/genhd.c | 5
> drivers/cpufreq/powernv-cpufreq.c | 5
> drivers/mmc/host/jz4740_mmc.c | 2
> drivers/net/ethernet/tile/tilepro.c | 4
> drivers/power/bq27xxx_battery.c | 5
> drivers/tty/metag_da.c | 4
> drivers/tty/mips_ejtag_fdc.c | 4
> drivers/usb/host/ohci-hcd.c | 1
> drivers/usb/host/xhci.c | 2
> include/linux/list.h | 10
> include/linux/timer.h | 30
> kernel/time/tick-internal.h | 1
> kernel/time/tick-sched.c | 46 -
> kernel/time/timer.c | 1099 +++++++++++++++++++++---------------
> lib/random32.c | 1
> net/ipv4/inet_connection_sock.c | 7
> net/ipv4/inet_timewait_sock.c | 5
> 19 files changed, 725 insertions(+), 514 deletions(-)
>
>