Re: [REGRESSION] suspend to ram fails in 6.2-rc1 due to tpm errors

From: Vlastimil Babka
Date: Tue Jan 10 2023 - 12:22:39 EST


On 1/9/23 17:08, Jason A. Donenfeld wrote:
> Hi Thorsten,
>
> On Thu, Jan 05, 2023 at 02:59:15PM +0100, Thorsten Leemhuis wrote:
>> On 29.12.22 05:03, Jason A. Donenfeld wrote:
>>> On Wed, Dec 28, 2022 at 06:07:25PM -0500, James Bottomley wrote:
>>>> On Wed, 2022-12-28 at 21:22 +0100, Vlastimil Babka wrote:
>>>>> Ugh, while the problem [1] was fixed in 6.1, it's now happening again
>>>>> on the T460 with 6.2-rc1. Except I didn't see any oops message or
>>>>> "tpm_try_transmit" error this time. The first indication of a problem
>>>>> is this during a resume from suspend to ram:
>>>>>
>>>>> tpm tpm0: A TPM error (28) occurred continue selftest
>>>>>
>>>>> and then periodically
>>>>>
>>>>> tpm tpm0: A TPM error (28) occurred attempting get random
>>>>
>>>> That's a TPM 1.2 error which means the TPM failed the selftest. The
>>>> original problem was reported against TPM 2.0 because of a missing
>>>> try_get_ops().
>>>
>>> No, I'm pretty sure the original bug, which was fixed by "char: tpm:
>>> Protect tpm_pm_suspend with locks" regards 1.2 as well, especially
>>> considering it's the same hardware from Vlastimil causing this. I also
>>> recall seeing this in 1.2 when I ran this with the TPM emulator. So
>>> that's not correct.
>>
>> James, are you or some other TPM developer looking into this? Or is this
>> deadlocked now? And if so: how can we get this unstuck to get this
>> regression solved?
>>
>> Side note: I wonder if the problem that Johannes reported yesterday in
>> this thread (
>> https://lore.kernel.org/all/Y7VCcgHUC6JtnO2b@xxxxxxxxx/
>> ) is related or something else, as it seems his issue happens with 6.1,
>> while Vlastimil's problems should be fixed there. Or am I missing something?
>
> So, this is now in rc3:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1382999aa0548a171a272ca817f6c38e797c458c
>
> That should help avoid the worst of the issue -- laptop not sleeping.
> But the race or whatever it is still does exist. So you might want to
> keep this in your tracker to periodically nudge the TPM folks about it.

Heh, booted rc3 and managed to hit it on very first suspend to ram attempt:

tpm tpm0: A TPM error (28) occurred continue selftest

But thanks to the patch, the next suspend worked:

[ 236.598900] tpm tpm0: Error (28) sending savestate before suspend
[ 236.598915] tpm_tis 00:08: Ignoring error 28 while suspending

and on resume again:

[ 238.196645] tpm tpm0: A TPM error (28) occurred continue selftest

and indeed now I keep getting (as expected)

[ 399.671077] tpm tpm0: A TPM error (28) occurred attempting get random

So hopefully somebody will look into the root cause at some point.

> Jason
>
>