[BUG] Lenovo x230: SATA errors with 180GB Intel 520 SSD underheavy write load

From: Mathieu Desnoyers
Date: Fri Feb 22 2013 - 18:11:24 EST


Hi,

We spent a couple of days cornering what appears to be an issue with the
Intel 520 SSD drives in Lenovo x230 laptops. It was first showing up
on a clean Debian installation, while installing a guest operating
system into a VM. Looking around on forums, there appears to be some
people having issues with database workloads too. So I decided to create
a small user-space program to repoduce the problem. IMPORTANT: Before
you try it, be ready for a system crash. It's available at:

git://git.efficios.com/test-ssd.git

direct link to .c file:
https://git.efficios.com/?p=test-ssd.git;a=blob;f=test-ssd-write.c;hb=refs/heads/master

This program simply performs random-access-writes of 4Kb into a single
file.

Executive summary of our findings (the details are in the
test-ssd-write.c header in the git repo):

- We reproduced this issue on 4 x230 machines (all our x230 have 180GB
Intel drives, and they are all affected),
- We took a SSD from one of the machines, moved it into an x200, and the
problem still occurs,
- The problem seems to occur independently of the filesystem (reproduced
on ext3 and ext4),
- Problem reproduced by test-ssd-write.c (git tree above): After less
than 5 minutes of the heavy write workload, we get SATA errors and we
need to cold reboot the machine to access the drive again. Example
usage (don't forget to prepare for a computer freeze):

./test-ssd-write somefileondisk 209715200 1234 -z

(see options by just running ./test-ssd-write)

The problem occurs with drive model SSDSC2BW180A3L, with both firmwares
LE1i and LF1i (those are Lenovo firmwares). We could reproduce the issue
on 3.2 (Debian), 3.5 (Debian), 3.7.9 (Arch) distribution kernels. We
could reproduce it with x230 BIOS G2ET90WW (2.50) 2012-20-12 and
G2ET86WW (2.06) 2012-11-13, but since it can be reproduced on a x200
too, it does not appear to be a BIOS issue.

We tried the program on a range of other SSD drives, one of those
including the same SandForce 2281 controller (details within
test-ssd-write.c header). So our current guess is that the Lenovo
firmware on the SSD might be part of the problem, but it might be good
if we could to confirm that Intel's firmwares work fine.

Thoughts, ideas, hints about who to contact on this issue would be very
much welcome,

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/