Re: [RFC] IPMI state machine regression

From: Andrew Banman
Date: Wed Aug 22 2018 - 12:24:00 EST


On Wed, Aug 22, 2018 at 11:14:52AM -0500, Corey Minyard wrote:
> On 08/21/2018 05:14 PM, Andrew Banman wrote:
> > Dear IPMI supporters,
> >
> > We observe a window in IPMI BT's opportunistic get capabilities request,
> > wherein GET_DEVICE_GUID and GET_DEVICE_ID requests may start while the BT state
> > machine is in WR_CONSUME. Following this, the 0xD5 error code is forced in
> > bt_start_transaction, IPMI fails to initialize, and the interface is torn down.
> > There is no mechanism to retry bringing up the interface in open() /dev/ipmi.
> > This leaves IPMI hosed until you reload modules. Looks to happen after we call
> > schedule().
>
> When was the latest kernel where this worked properly?  Also, what hardware
> is this?

This is UV4.

First known bad commit, but I am not sure if the timing issue predates
it:

commit aa9c9ab2443e3b9562c6c7cfc245a9e43b557d14
Author: Jeremy Kerr <jk@xxxxxxxxxx>
Date: Fri Aug 25 15:47:24 2017 +0800

ipmi: allow dynamic BMC version information

Hits less frequently with older kernels so I didn't see it until
recently when it became more frequent.

>
> BTW, you can use the "hotmod" capability of the IPMI driver to add the
> device
> dynamically.
>
> -corey