Re: [BUG] increased us/sys-load due to tty-layer in 2.6.38+ ?!

From: Greg Kroah-Hartman
Date: Mon Apr 08 2013 - 11:06:24 EST


On Mon, Apr 08, 2013 at 11:25:58AM +0200, Steffen Trumtrar wrote:
> Hi!
>
> I noticed a problem with the tty subsystem on ARM. Starting with 2.6.38+ load
> on the serial connection causes a 10-15% increase in system/userspace load.
> This doesn't change up to v3.9-rc4.
>
> The following setup was used:
>
> telnet && screen microcom -p /dev/ttyUSB0
> | +--------+
> |-------------->------------|----+ |
> +-------+<---------<------------|----+ |
> | | +------+ | |
> | UUT |<-USB->| FTDI |<-UART->| |
> | | +------+ | PC |
> +-------+ +--------+
> ^
> |
> telnet && top -d1
>
> The unit under test (UUT) is connected via USB->FTDI->UART to a PC. On the PC
> a "while true; do find /; done" produces some random output.
> I connect to the UUT via telnet and then open a serial connection to the PC
> in a screen session, seeing the output produced on the PC. Then screen gets
> detached. So, basically, what I'm trying to do is producing load only on the
> USB->FTDI->UART connection and not on the UUT itself.
> Then another telnet connection is opened, to monitor the UUT with "top -d1".
> As UUT an imx27, kirkwood and an AT91 were used.
>
> To find the "offending" code, I bisected v2.6.38..v3.0 which gave the following
> top output (non-scientifically, I know. But the switch in load distribution is
> obvious nevertheless):
>
> 2.6.38 Cpu(s): 3.8%us, 1.9%sy, 0.0%ni, 94.3%id
> 2.6.38+ Cpu(s): 1.9%us, 3.8%sy, 0.0%ni, 94.3%id
> last good commit Cpu(s): 1.9%us, 2.8%sy, 0.0%ni, 95.3%id
> first bad commit Cpu(s): 4.8%us, 14.5%sy, 0.0%ni, 80.6%id
> 2.6.39-rc4 Cpu(s): 10.5%us, 8.9%sy, 0.0%ni, 79.8%id
> 3.0 Cpu(s): 15.9%us, 19.6%sy, 0.0%ni, 62.3%id
>
> This resulted in
> f23eb2b2b28547fc70df82dd5049eb39bec5ba12
> tty: stop using "delayed_work" in the tty layer
>
> as possible cause. Reverting this commit by hand in v3.8 showed a load distribution
> similar to 2.6.38.
> What I haven't done, is measure if the load is really increasing or if top only
> tells me so. Maybe the algorithm to calculate this somehow produces different
> results because of the switch from schedule_delayed_work to schedule_work?
> So, is this a bug, a feature, a symptom,...?

It's a "fake" load (i.e. no extra cpu is being used, just a "busy" wait
is happening.)

You should see an increased throughput with that patch applied, have you
tested a real workload?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/