Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

From: Alexey Kuznetsov
Date: Thu Jun 28 2007 - 10:40:26 EST


Hello!

> the context-switch argument i'll believe if i see numbers. You'll
> probably need in excess of tens of thousands of irqs/sec to even be able
> to measure its overhead. (workqueues are driven by nice kernel threads
> so there's no TLB overhead, etc.)

It was authors of the patch who were supposed to give some numbers,
at least one or two, just to prove the concept. :-)

According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet
schedule and execution eats ~300ns, workqueue eats ~4usec.
On my 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.

Formally looking awful, this result is positive: tasklets are almost
never used in hot paths. I am sure only about one such place: acenic
driver uses tasklet to refill rx queue. This generates not more than
3000 tasklet schedules per second. Even on P4 it pure workqueue schedule
will eat ~1% of bare cpu ticks.

Anyway, all the uses of tasklet should be verified:

The most dubios place is popular Neterion 10Gbit driver, which uses
tasklet like acenic. But at 10Gbit, multiply acenic numbers and panic. :-)

Also, there exists some hardware which uses tasklets even harder,
but I have no idea what real frequencies are: f.e. sundance.

The case with acenic/s2io is quite special: normally network drivers
refill queues in irq handlers. It was Jes Sorensen observation
that offloading refilling from irq improves performance, I do not
remember numbers. Probably, switching to workqueues will not affect
performance at all, probably it will just collapse, no idea.


> ... workqueues are also possibly much more scalable

I cannot figure out - scale in what direction? :-)


> (percpu workqueues
> are easy without changing anything in your code but the call where you
> create the workqueue).

I do not see how it is related to scalability. And the statement
does not even make sense. The patch already uses per-cpu workqueue
for tasklets, otherwise it would be a disaster: guaranteed cpu non-locality.

Tasklet is single thread by definition and purpose. Those a few places
where people used tasklets to do per-cpu jobs (RCU f.e.) exist just because
they had troubles with allocating new softirq. Workqueues do not make
any difference: tasklet is not workqueue, it is work_struct, and you
still will have to allocate array of per-cpu work structs, everything
remains the same.


> the only remaining argument is latency:

You could set realtime prioriry by default, not a poor nice -5.
If some network adapters were killed just because I run some task
with nice --22, it would be just ridiculous.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/