Re: [OOPS] 2.6.11 - NMI lockup with CFQ scheduler

From: Mike Anderson
Date: Wed Apr 06 2005 - 15:36:25 EST

Next message: Hua Zhong: "RE: Kernel SCM saga.."
Previous message: Ivan Yosifov: "Re: Out of memory with Java 1.5 and 2.6.11.6"
In reply to: Jens Axboe: "Re: [OOPS] 2.6.11 - NMI lockup with CFQ scheduler"
Next in thread: James Bottomley: "Re: [OOPS] 2.6.11 - NMI lockup with CFQ scheduler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Tejun Heo [htejun@xxxxxxxxx] wrote:
> Jens Axboe wrote:
> >On Wed, Apr 06 2005, Arjan van de Ven wrote:
> >
> >>>@@ -324,6 +334,7 @@
> >>> issue_flush_fn *issue_flush_fn;
> >>> prepare_flush_fn *prepare_flush_fn;
> >>> end_flush_fn *end_flush_fn;
> >>>+ release_queue_data_fn *release_queue_data_fn;
> >>>
> >>> /*
> >>> * Auto-unplugging state
> >>
> >>where does this function method actually get called?
> >
> >
> >I missed the hunk in ll_rw_blk.c, rmk pointed the same thing out not 5
> >minutes ago :-)
> >
> >The patch would not work anyways, as scsi_sysfs.c clears queuedata
> >unconditionally. This is a better work-around, it just makes the queue
> >hold a reference to the device as well only killing it when the queue is
> >torn down.
> >
> >Still not super happy with it, but I don't see how to solve the circular
> >dependency problem otherwise.
> >
>
> Hello, Jens.
>
> I've been thinking about it for a while. The problem is that we're
> reference counting two different objects to track lifetime of one
> entity. This happens in both SCSI upper and mid layers. In the upper
> layer, genhd and scsi_disk (or scsi_cd, ...) are ref'ed separately while
> they share their destiny together (not really different entity) and in
> the middle layer scsi_device and request_queue does the same thing.
> Circular dependency is occuring because we separate one entity into two
> and reference counting them separately. Two are actually one and
> necessarily want each other. (until death aparts. Wow, serious. :-)
>
> IMHO, what we need to do is consolidate ref counting such that in each
> layer only one object is reference counted, and the other object is
> freed when the ref counted object is released. The object of choice
> would be genhd in upper layer and request_queue in mid layer. All
> ref-counting should be updated to only ref those objects. We'll need to
> add a release callback to genhd and make request_queue properly
> reference counted.
>
> Conceptually, scsi_disk extends genhd and scsi_device extends
> request_queue. So, to go one step further, as what UL represents is
> genhd (disk device) and ML request_queue (request-based device),
> embedding scsi_disk into genhd and scsi_device into request_queue will
> make the architecture clearer. To do this, we'll need something like
> alloc_disk_with_udata(int minors, size_t udata_len) and the equivalent
> for request_queue.
>
> I've done this half-way and then doing it without fixing the SCSI
> model seemed silly so got into working on the state model. (BTW, the
> state model is almost done, I'm about to run tests.)
>
> What do you think? Jens?

Well I think extends is one way to look at the subsystem objects,
Couldn't it also be said that these objects from each subsystem have just
a relationship (parent / child, etc). As reference counting has been
implemented in each subsystem sometimes interfaces that cross subsystem
boundaries (had / have) not been converted to use similar life time rules.

Well your solution tries to solve the problem by creating a new larger
object that contains both of the old objects. Another solution would be to
use a consistent lifetime rules and stay with smaller objects. Unless
going to large objects helps with allocation fragmentation or we get some
other benefit it would seem that these combined structures may sometime in
the future limit creation of lighter or flexible objects.

It would appear another solution is that when you allocate a resource from
another subsystem (i.e. blk_init_queue) that both subsystems participate
in the same reference counting model and in the allocation routine you
past in your object to be referenced counted by the allocating subsystem.
Then when it is time to shutdown you do not free the others subsystems
object directly, but use the normal release routines.

-andmike
--
Michael Anderson
andmike@xxxxxxxxxx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Hua Zhong: "RE: Kernel SCM saga.."
Previous message: Ivan Yosifov: "Re: Out of memory with Java 1.5 and 2.6.11.6"
In reply to: Jens Axboe: "Re: [OOPS] 2.6.11 - NMI lockup with CFQ scheduler"
Next in thread: James Bottomley: "Re: [OOPS] 2.6.11 - NMI lockup with CFQ scheduler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]