Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <-ondemand_readahead

From: Andrew Morton
Date: Mon Sep 13 2010 - 17:42:33 EST


On Sat, 11 Sep 2010 11:50:41 +0200
Stefan Richter <stefanr@xxxxxxxxxxxxxxxxx> wrote:

> Full quote for lkml:
>
> bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=18252
> >
> > Summary: spinlock lockup in __make_request <- submit_bio <-
> > ondemand_readahead
> > Product: IO/Storage
> > Version: 2.5
> > Kernel Version: 2.6.36-rc3
> > Platform: All
> > OS/Version: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: Block Layer
> > AssignedTo: axboe@xxxxxxxxx
> > ReportedBy: stefanr@xxxxxxxxxxxxxxxxx
> > Regression: No
> >
> >
> > Created an attachment (id=29562)
> > --> (https://bugzilla.kernel.org/attachment.cgi?id=29562)
> > BUG screenshot
> >
> > After a week uptime of 2.6.36-rc3 (I ran 2.6.35 before that),
>
> Almost two weeks uptime actually.
>
> > I was greeted by a black screen of death today in the morning:
> >
> > (see screenshot in attachment; partial transcript:)
> >
> > sending NMI to all CPUs:
> > BUG: soinlock lockup on CPU#0, ktorrent/4313, ffff8802...
> > PID: 4313, comm: ktorrent Tainted: G M D W 2.6.36-rc3 #3
> > Call Trace:
> > [...] do_raw_spin_lock+0x118/0x147
> > [...] _raw_spin_lock_irq+0x44/0x49
> > [...] ? __make_request+0x5c/0x400
> > [...] __make_request+0x5c/0x400
> > [...] generic_make_request+0x23a/0x2a9
> > [...] submit_bio+0xad/b6
> > [...] mpage_bio_submit...
> > [...] do_mpage_readpage...
> > [...] ? get_parent_ip...
> > [...] ? sub_preempt_count...
> > [...] ? __lru_cache_add...
> > [...] mpage_readpages...
> > [...] ? ext4_get_block...
> > [...] ? __alloc_pages_nodemask...
> > [...] ? ext4_get_block...
> > [...] ext4_readpages...
> > [...] __do_page_cache_readahead...
> > [...] ? __do_page_cache_readahead...
> > [...] ra_submit...
> > [...] ondemand_readahead...
> >
> > This is a system with Phenom II x4 and Radeon graphics. Since kernel mode
> > setting is fairly new for radeon, it is possible that the lockup happened with
> > earlier kernels too but simply ended in a lockup without trace dump to the
> > screen. IOW, it is not clear to me whether this is a regression or not.
> >
> > The bug happened while kaffeine wrote an MPEG 2 TS to the same filesystem from
> > which ktorrent was reading. Of course this kind of commonplace workload
> > happened without problem two or three times before during the week in which I
> > ran 2.6.36-rc3.
> >
>
> (The screenshot is a bit large, hence I reported in bugzilla instead of the list.)
>

What you've quoted above appears to be just the aftermath.
https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the
kernel earlier crashed in scsi code, perhaps under
scsi_setup_fs_cmnd().

The question is: was that actually the first crash, or did an even
earlier one scroll off?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/