Re: [RFC PATCH] sched: Invoke io_wq_worker_sleeping() with enabled preemption

From: Sebastian Andrzej Siewior
Date: Wed Aug 19 2020 - 09:33:42 EST


On 2020-08-19 15:15:07 [+0200], peterz@xxxxxxxxxxxxx wrote:

> > - if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) {
> > + if (tsk->flags & PF_WQ_WORKER) {
> > preempt_disable();
> > - if (tsk->flags & PF_WQ_WORKER)
> > - wq_worker_sleeping(tsk);
> > - else
> > - io_wq_worker_sleeping(tsk);
> > + wq_worker_sleeping(tsk);
> > preempt_enable_no_resched();
> > }
> >
> > if (tsk_is_pi_blocked(tsk))
> > return;
> >
> > + if (tsk->flags & PF_IO_WORKER)
> > + io_wq_worker_sleeping(tsk);
> > +
>
> Urgh, so this adds a branch in what is normally considered a fairly hot
> path.
>
> I'm thinking that the raw_spinlock_t option would permit leaving that
> single:
>
> if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER))
>
> branch intact?

The compiler generates code to test for both flags at once. If none of
both possible flags are set then there is one branch (get out and bring
me to tst_is_pi…).
And yes, with raw_spinlock_t we could keep that one branch.

If you want to optimize further, we could move PF_IO_WORKER to an lower
bit. x86 can test for both via
(gcc-10)
| testl $536870944, 44(%rbp) #, _11->flags
| jne .L1635 #,

(clang-9)
| testl $536870944, 44(%rbx) # imm = 0x20000020
| je .LBB112_6


but ARM can't and does
| ldr r1, [r5, #16] @ tsk_3->flags, tsk_3->flags
| mov r2, #32 @ tmp157,
| movt r2, 8192 @ tmp157,
| tst r2, r1 @ tmp157, tsk_3->flags
| beq .L998 @,

same ARM64
| ldr w0, [x20, 60] //, _11->flags
| and w0, w0, 1073741792 // tmp117, _11->flags,
| and w0, w0, -536870849 // tmp117, tmp117,
| cbnz w0, .L453 // tmp117,

using 0x10 for PF_IO_WORKER instead will turn this into:
| ldr w0, [x20, 60] //, _11->flags
| tst w0, 48 // _11->flags,
| bne .L453 //,

ARM:
| ldr r2, [r5, #16] @ tsk_3->flags, tsk_3->flags
| tst r2, #48 @ tsk_3->flags,
| beq .L998 @,

Sebastian