Re: Re: i686 hang on boot in userspace

From: gmu 2k6
Date: Tue Jul 25 2006 - 10:48:14 EST

On 7/25/06, Jens Axboe <axboe@xxxxxxx> wrote:
On Tue, Jul 25 2006, gmu 2k6 wrote:
> On 7/25/06, Jens Axboe <axboe@xxxxxxx> wrote:
> >On Tue, Jul 25 2006, gmu 2k6 wrote:
> >> ok, let's nail it to 2.6.17-git5 instead as it survived git status
> >> compared to -git6
> >> which seems to have correctly booted by accident the lastime. timing
> >issues
> >> I guess.
> >
> >I will try and reproduce it here now. It seems to be in between commit
> >271f18f102c789f59644bb6c53a69da1df72b2f4 and commit
> >dd67d051529387f6e44d22d1d5540ef281965fdd where the first one could also
> >be bad.
> >
> >I'm assuming that acf421755593f7d7bd9352d57eda796c6eb4fa43 should be
> >good, so you can try and verify that
> >dd67d051529387f6e44d22d1d5540ef281965fdd is bad and bisect between the
> >two. It's only about 6 commits, so should be quick enough to do.
> 1) no luck with remote serial console
> 2) netconsole does not work although connecting to the listener with netcat
> and
> sending strings works
> I'm gonna try via physical rs232 9pins and see how that works.
> afterwards I will try to bisect the revisions you mentioned.
> btw, the issue seems to come and go as I managed to boot log into a .17-git6
> kernel or is timing-dependent.

I can reproduce it, you don't have to spend more time on bisecting or
testing. This should fix it:

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 1c4df22..1eac041 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1238,6 +1238,7 @@ static void cciss_softirq_done(struct re
CommandList_struct *cmd = rq->completion_data;
ctlr_info_t *h = hba[cmd->ctlr];
unsigned long flags;
+ request_queue_t *q;
u64bit temp64;
int i, ddir;

@@ -1260,10 +1261,13 @@ #ifdef CCISS_DEBUG
printk("Done with %p\n", rq);
#endif /* CCISS_DEBUG */

+ q = rq->q;
spin_lock_irqsave(&h->lock, flags);
end_that_request_last(rq, rq->errors);
cmd_free(h, cmd, 1);
+ blk_start_queue(q);
spin_unlock_irqrestore(&h->lock, flags);

A better fix would rework the start_queue logic entirely in the driver,
but the above should get you running for now. I'll take a further look.

this four-liner seems to fix it:
- I can boot
- log in
- git-status works
- svn up works

as my last mail said the 2nd patch with the new function introduced
did hang cciss
on driver init before printing any drive info.

btw, I assume you have systems with SmartArray 6* at your disposal to
test, right?
I mean SuSE should have some as a distro vendor.
