Re: 32GB SSD on USB1.1 P3/700 == ___HELL___ (2.6.34-rc3)

From: Bill Davidsen
Date: Thu Apr 08 2010 - 16:13:04 EST


Andreas Mohr wrote:
[CC'd some lucky candidates]

Hello,

I was just running
mkfs.ext4 -b 4096 -E stride=128 -E stripe-width=128 -O ^has_journal
/dev/sdb2
on my SSD18M connected via USB1.1, and the result was, well,
absolutely, positively _DEVASTATING_.

The entire system became _FULLY_ unresponsive, not even switching back
down to tty1 via Ctrl-Alt-F1 worked (took 20 seconds for even this key
to be respected).

Once back on ttys, invoking any command locked up for minutes
(note that I'm talking about attempted additional I/O to the _other_,
_unaffected_ main system HDD - such as loading some shell binaries -,
NOT the external SSD18M!!).

Having an attempt at writing a 300M /dev/zero file to the SSD's filesystem
was even worse (again tons of unresponsiveness), combined with multiple
OOM conditions flying by (I/O to the main HDD was minimal, its LED was
almost always _off_, yet everything stuck to an absolute standstill).

Clearly there's a very, very important limiter somewhere in bio layer
missing or broken, a 300M dd /dev/zero should never manage to put
such an onerous penalty on a system, IMHO.

You are using a USB 1.1 connection, about the same speed as a floppy. If you have not tuned your system to prevent all of the memory from being used to cache writes, it will be used that way. I don't have my notes handy, but I believe you need to tune the "dirty" parameters of /proc/sys/vm so that it makes better use of memory.

Of course putting a fast device like SSD on a super slow connection makes no sense other than as a test of system behavior on misconfigured machines.

I've got SysRq-W traces of these lockup conditions if wanted.


Not sure whether this is a 2.6.34-rc3 thing, might be a general issue.

Likely the lockup behaviour is a symptom of very high memory pressure.
But this memory pressure shouldn't even be allowed to happen in the first
place, since the dd submission rate should immediately get limited by the kernel's
bio layer / elevators.

Also, I'm wondering whether perhaps additionally there are some cond_resched()
to be inserted in some places, to try to improve coping with such a
broken situation at least.

Thanks,

Andreas Mohr


--
Bill Davidsen <davidsen@xxxxxxx>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/