Re: [REGRESSION] hwmon: (applesmc) avoid overlong udelay()

From: Andreas Kemnade
Date: Tue Oct 06 2020 - 03:02:52 EST


On Thu, 1 Oct 2020 21:07:51 -0700
Guenter Roeck <linux@xxxxxxxxxxxx> wrote:

> On 10/1/20 3:22 PM, Andreas Kemnade wrote:
> > On Wed, 30 Sep 2020 22:00:09 +0200
> > Arnd Bergmann <arnd@xxxxxxxx> wrote:
> >
> >> On Wed, Sep 30, 2020 at 6:44 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >>>
> >>> On Wed, Sep 30, 2020 at 10:54:42AM +0200, Andreas Kemnade wrote:
> >>>> Hi,
> >>>>
> >>>> after the $subject patch I get lots of errors like this:
> >>>
> >>> For reference, this refers to commit fff2d0f701e6 ("hwmon: (applesmc)
> >>> avoid overlong udelay()").
> >>>
> >>>> [ 120.378614] applesmc: send_byte(0x00, 0x0300) fail: 0x40
> >>>> [ 120.378621] applesmc: LKSB: write data fail
> >>>> [ 120.512782] applesmc: send_byte(0x00, 0x0300) fail: 0x40
> >>>> [ 120.512787] applesmc: LKSB: write data fail
> >>>>
> >>>> CPU sticks at low speed and no fan is turning on.
> >>>> Reverting this patch on top of 5.9-rc6 solves this problem.
> >>>>
> >>>> Some information from dmidecode:
> >>>>
> >>>> Base Board Information
> >>>> Manufacturer: Apple Inc.
> >>>> Product Name: Mac-7DF21CB3ED6977E5
> >>>> Version: MacBookAir6,2
> >>>>
> >>>> Handle 0x0020, DMI type 11, 5 bytes OEM Strings String 1: Apple ROM Version. Model: …,
> >>>> Handle 0x0020, DMI type 11, 5 bytes
> >>>> OEM Strings
> >>>> String 1: Apple ROM Version. Model: MBA61. EFI Version: 122.0.0
> >>>> String 2: .0.0. Built by: root@saumon. Date: Wed Jun 10 18:
> >>>> String 3: 10:36 PDT 2020. Revision: 122 (B&I). ROM Version: F000_B
> >>>> String 4: 00. Build Type: Official Build, Release. Compiler: Appl
> >>>> String 5: e clang version 3.0 (tags/Apple/clang-211.10.1) (based on LLVM
> >>>> String 6: 3.0svn).
> >>>>
> >>>> Writing to things in /sys/devices/platform/applesmc.768 gives also the
> >>>> said errors.
> >>>> But writing 1 to fan1_maunal and 5000 to fan1_output turns the fan on
> >>>> despite error messages.
> >>>>
> >>> Not really sure what to do here. I could revert the patch, but then we'd gain
> >>> clang compile failures. Arnd, any idea ?
> >>
> >> It seems that either I made a mistake in the conversion and it sleeps for
> >> less time than before, or my assumption was wrong that converting a delay to
> >> a sleep is safe here.
> >>
> >> The error message indicates that the write fails, not the read, so that
> >> is what I'd look at first. Right away I can see that the maximum time to
> >> retry is only half of what it used to be, as we used to wait for
> >> 0x10, 0x20, 0x40, 0x80, ..., 0x20000 microseconds for a total of
> >> 0x3fff0 microseconds (262ms), while my patch went with the 131ms
> >> total delay based on the comment saying "/* wait up to 128 ms for a
> >> status change. */".
> >>
> > Yes, that is also what I read from the code. I just thought there must
> > be something simple, which just needs a short look from another pair of
> > eyes.
> >
> >> Since there is sleeping wait, I see no reason the timeout couldn't
> >> be extended a lot, e.g. to a second, as in
> >>
> >> #define APPLESMC_MAX_WAIT 0x100000
> >>
> >> If that doesn't work, I'd try using mdelay() in place of
> >> usleep_range(), such as
> >>
> >> mdelay(DIV_ROUND_UP(us, USEC_PER_MSEC)));
> >>
> >> This adds back a really nasty latency, but it should avoid the
> >> compile-time problem.
> >>
> >> Andreas, can you try those two things? (one at a time,
> >> not both)
> >
> > Ok, I tried. None of them works. I rechecked my work and created real
> > git commits out of them and CONFIG_LOCALVERSION_AUTO is also set so
> > the usual stupid things are rules out.
> > In detail:
> > On top of 5.9-rc6 + *reverted* patch:
> > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> > index fd99c9df8a00..2a9bd7f2b71b 100644
> > --- a/drivers/hwmon/applesmc.c
> > +++ b/drivers/hwmon/applesmc.c
> > @@ -45,7 +45,7 @@
> > /* wait up to 128 ms for a status change. */
> > #define APPLESMC_MIN_WAIT 0x0010
> > #define APPLESMC_RETRY_WAIT 0x0100
> > -#define APPLESMC_MAX_WAIT 0x20000
> > +#define APPLESMC_MAX_WAIT 0x8000
> >
> > #define APPLESMC_READ_CMD 0x10
> > #define APPLESMC_WRITE_CMD 0x11
> >
>
> Oh man, that code is so badlys broken.
>
> send_byte() repeats sending the data if it was not immediately successful.
> That is done for both data and commands. Effectively that happens if
> the command is not immediately accepted. However, send_argument()
> clearly assumes that each data byte is sent exactly once. Sending
> it more than once will mess up the key that is supposed to be sent.
> The Apple SMC emulation code in qemu confirms that data bytes can not
> be written more than once.
>
> Of course, theoretically it may be that the first data byte was not
> accepted (after all, the ACK bit is not set), but the ACK bit is
> not checked again after udelay(APPLESMC_RETRY_WAIT), so it may
> well have been set in the 256 uS between its check and re-writing
> the data.
>
> In other words, this entire code only works accidentally to start with.
>
> If you like, you could play around with the code and find out if and
> when exactly bit 1 (busy) is set, if and when bit 2 (ack) is set, and
> if and when any other bit is set. We could also try to read port 0x31e
> (the error port). Maybe the we can figure out what the error actually
> is. But then I don't really know what we could do with that information.
>
Smoe research results: the second data byte seems to cause problems, not the
command byte.

> Other than that, the only useful idea I have is something crazy like
> if (us < 10000)
> udelay(us);
> else
> mdelay(DIV_ROUND_CLOSEST(udelay, 1000));
> in the hope that clang doesn't convert that back into a
> compile-time constant and udelay().
>
> Overall it seems like the apple protocol may expect to receive data
> bytes faster than 1ms apart, because that is the only real difference
> between the original code and the new code using mdelay().

Yes, that explanation makes sense. If I am trying something like that, only
the last byte requires more than APPLESMC_MIN_WAIT. I have seen max. 256us.
So we could probably even use msleep for us > 1000 and udelay for anything below.

Regards,
Andreas

Attachment: pgpMaelQUA3Qy.pgp
Description: OpenPGP digital signature