RE: kexec and aacraid broken

From: Salyzyn, Mark
Date: Wed May 30 2007 - 07:44:28 EST


I believe this issue is a result of the aacraid_commit_reset patch (as
posted for scsi-misc-2.6, enclosed to permit testing) not yet propagated
to the 2.6.22-rc3 tree.

This is the adapter taking longer than 3 minutes to start after a reset.
I seriously doubt either of these patches suggested below will have an
affect. And if they do, they are not root cause, one reduces the chances
that the card will be reset during initialization (thus applied would
likely mitigate this problem), the other prevents a panic when the
Adapter is reset (removed, would result in dogs and cats sleeping with
each other).

Please use kernel parameter aacraid.startup_timeout=540 (merely larger
than the default 180 seconds) when spawning the kexec or see if the
aacraid_commit_reset.patch resolves the issue to confirm my hunch.

Sincerely -- Mark Salyzyn

> -----Original Message-----
> From: Andrew Morton [mailto:akpm@xxxxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, May 29, 2007 10:14 PM
> To: Yinghai Lu
> Cc: Vivek Goyal; Eric W. Biederman; AACRAID; Linux Kernel
> Mailing List; linux-scsi@xxxxxxxxxxxxxxx; Michal Piotrowski
> Subject: Re: kexec and aacraid broken
>
>
> On Tue, 29 May 2007 18:59:32 -0700 "Yinghai Lu"
> <yhlu.kernel@xxxxxxxxx> wrote:
>
> > latest tree, can not use kexec to load 2.6.22-rc3 at least.
> >
> > got:
> >
> > AAC0: adapter kernel panic'd fffffffd
> > AAC0: adapter kernel failed to start, init status=0
>
> One of the two diffs below, I guess. Please do a `patch -R
> -p1' of this
> email and retest?
>
> >
> > but can load 2.6.21.3
> >
>
> Michal, can you please add this to the regression list?
>
>
>
>
> commit 9e4d4a5d71d673901d9c1df5146ce545c2cc0cc0
> Author: Salyzyn, Mark <mark_salyzyn@xxxxxxxxxxx>
> Date: Tue May 1 11:43:06 2007 -0400
>
> [SCSI] aacraid: superfluous adapter reset for IBM 8
> series ServeRAID controllers
>
> The kexec patch introduced a superfluous (and otherwise
> inert) reset of
> some adapters. The register can have a hardware default
> value that has
> zeros for the undefined interrupts. This patch refines
> the test of the
> interrupt enable register to focus on only the interrupts
> that affect
> the driver in order to detect if an incomplete shutdown
> of the Adapter
> had occurred (kdump).
>
> Signed-off-by: Mark Salyzyn <aacraid@xxxxxxxxxxx>
> Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxx>
>
> diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c
> index b6ee3c0..291cd14 100644
> --- a/drivers/scsi/aacraid/rx.c
> +++ b/drivers/scsi/aacraid/rx.c
> @@ -542,7 +542,7 @@ int _aac_rx_init(struct aac_dev *dev)
> dev->a_ops.adapter_sync_cmd = rx_sync_cmd;
> dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt;
> dev->OIMR = status = rx_readb (dev, MUnit.OIMR);
> - if ((((status & 0xff) != 0xff) || reset_devices) &&
> + if ((((status & 0x0c) != 0x0c) || reset_devices) &&
> !aac_rx_restart_adapter(dev, 0))
> ++restart;
> /*
> commit a5694ec545a880f9d23463fddc894f5096cc68fa
> Author: Salyzyn, Mark <mark_salyzyn@xxxxxxxxxxx>
> Date: Mon Apr 30 13:22:24 2007 -0400
>
> [SCSI] aacraid: kexec fix (reset interrupt handler)
>
> Another layer on this onion also discovered by Duane, the
> interrupt enable handler also needed to be set ... The
> interrupt enable
> was called from within the synchronous command handler.
>
> Signed-off-by: Mark Salyzyn <aacraid@xxxxxxxxxxx>
> Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxx>
>
> diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c
> index 0c71315..b6ee3c0 100644
> --- a/drivers/scsi/aacraid/rx.c
> +++ b/drivers/scsi/aacraid/rx.c
> @@ -539,6 +539,8 @@ int _aac_rx_init(struct aac_dev *dev)
> }
>
> /* Failure to reset here is an option ... */
> + dev->a_ops.adapter_sync_cmd = rx_sync_cmd;
> + dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt;
> dev->OIMR = status = rx_readb (dev, MUnit.OIMR);
> if ((((status & 0xff) != 0xff) || reset_devices) &&
> !aac_rx_restart_adapter(dev, 0))
>
>

Attachment: aacraid_commit_reset.patch
Description: aacraid_commit_reset.patch