Re: [linux-pm] libata, hddtemp + s2ram is racy

From: Alan Stern
Date: Mon May 10 2010 - 10:04:26 EST


On Fri, 7 May 2010, Bruno [UTF-8] Prémont wrote:

> Hi,
>
> On a SMP system I've hit a race condition between suspend and hddtemp
> checking disk's temperature.
>
> System details:
> - Dual-core AMD Turion CPU
> - 00:12.0 SATA controller [0106]: ATI Technologies Inc SB600 Non-Raid-5 SATA [1002:4380]
> - hddtemp-0.3_beta15 (Gentoo package app-admin/hddtemp-0.3_beta15-r3)
> - Linus' tree shortly after v2.6.34-rc6, at commit
> be1066bbcd443a65df312fdecea7e4959adedb45 with some drm updates on
> top of it.
>
>
> It looked like hddtemp had sent the SMART request to disk right before
> suspend and during suspend process ata2 did complain without aborting
> suspend (see below).
> After resume access to that disk was dead-locked (any further
> attempt to suspend timed-out freezing hddtemp and any access attempt
> towards that disk did put userspace tasks in uninterruptible state and
> caused soft-raid to mark the disk failed).
>
> Is suspend not waiting on SG_IO ioctls to complete (at ata host level)?

If you use the libata driver then the ATA I/O is handled by the SCSI
midlayer. The SCSI midlayer waits for the device's command queue to
drain completely before initiating a suspend.

Furthermore, the hddtemp process has to get frozen before the suspend
can begin, and I believe a process cannot be frozen while it is waiting
for an SG_IO to complete.

At any rate, you should try posting this bug report on the linux-scsi
mailing list.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/