Re: [RFC PATCH 0/2] RT scheduling policies for workqueues

From: Rasmus Villemoes
Date: Wed Apr 06 2022 - 09:19:01 EST


On 01/04/2022 11.21, Sebastian Andrzej Siewior wrote:
> On 2022-03-29 10:33:19 [+0200], Rasmus Villemoes wrote:
>> On 29/03/2022 08.30, Sebastian Andrzej Siewior wrote:
>>> On 2022-03-28 07:39:25 [-1000], Tejun Heo wrote:
>>>> Hello,
>>> Hi,
>>>
>>>> I wonder whether it'd be useful to provide a set of wrappers which can make
>>>> switching between workqueue and kworker easy. Semantics-wise, they're
>>>> already mostly aligned and it shouldn't be too difficult to e.g. make an
>>>> unbounded workqueue be backed by a dedicated kthread_worker instead of
>>>> shared pool depending on a flag, or even allow switching dynamically.
>>
>> Well, that would certainly not make it any easier for userspace to
>> discover the thread it needs to chrt().
>
> It should be configured within the tty-layer and not making a working RT
> just because it is possible.

I'm sorry, I can't parse that sentence.

The tty-layer cannot possibly set the right RT priorities, only the
application/userspace/the BSP developer knows what is right. The kernel
has rightly standardized on just the two sched_set_fifo and
sched_set_fifo_low; the admin must configure the system, but that also
requires that the admin has access to knobs to actually do that.
>>
>> Here's another idea: In an ideal world, the irq thread itself [people
>> caring about latency use threaded interrupts] could just do the work
>> immediately - then the admin only has one kernel thread to properly
>> configure. However, as Sebastian pointed out, doing that leads to a
>> lockdep splat [1], and it also means that there's no work item involved,
>> so some other thread calling tty_buffer_flush_work() might not actually
>> wait for a concurrent flush_to_ldisc() to finish. So could we create a
>> struct hybrid_work { } which, when enqueued, does something like
>>
>> bool current_is_irqthread(void) { return in_task() &&
>> kthread_func(current) == irq_thread; }
>>
>> hwork_queue(struct hybrid_work *hwork, struct workqueue_struct *wq)
>> if (current_is_irqthread()) {
>> task_work_add(current, &hwork->twork)
>> } else {
>> queue_work(wq, &hwork->work);
>> }
>>
>> (with extra bookkeeping so _flush and _cancel_sync methods can also be
>> created). It would require irqthread to learn to run its queued
>> task_works in its main loop, which in turn would require finding some
>> other way to do the irq_thread_dtor() cleanup, but that should be doable.
>>
>> While the implementation of hybrid_work might be a bit complex, I think
>> this would have potential for being used in other situations, and for
>> the users, the API would be as simple as the current workqueue/struct
>> kwork APIs. By letting the irq thread do more/all of the work, we'd
>> probably also win some latency due to fewer threads involved and better
>> cache locality. And the admin/BSP is already setting the rt priorities
>> of the [irq/...] threads.
>
> Hmmm. Sounds complicated. Especially the part where irqthread needs to
> deal with irq_thread_dtor in another way.

Well, we wouldn't need to use the task_work mechanism, we could also add
a list_head to struct irqaction {} aka the irq thread's kthread_data().

> If this is something we want for everyone and not just for the "low
> latency" attribute because it seems to make sense for everyone, would it
> work to add the data in one step and then flush it once all locks are
> dropped? The UART driver could be extended to a threaded handler if it
> is not desired/ possible to complete in the primary handler.

Yes, the idea is certainly to create something which is applicable more
generally than just for the tty problem. There are lots of places where
one ends up with a somewhat silly situation in that the driver's irq
handler is carefully written to not do much more than just schedule a
work item, so with the -RT patch set, we wake a task so it can wake a
task so it can ... And it also means that the admin might have carefully
adjusted the rt priority of the irq/foobar kernel thread and the
consuming application, but that doesn't matter when there's some random
SCHED_OTHER task in between - i.e. exactly the tty problem.

I guess I should write some real patches to explain what I mean more
clearly.

Rasmus