Re: post 2.6.28 regression: device_initialize() now sleeps, andmay fail without recovery strategy

From: Greg KH
Date: Fri Jan 09 2009 - 16:32:00 EST


On Fri, Jan 09, 2009 at 10:13:55PM +0100, Stefan Richter wrote:
> Greg KH wrote:
> > On Fri, Jan 09, 2009 at 07:35:42PM +0100, Stefan Richter wrote:
> >> We can fix the bug by changing firewire-core, but
> >> a) it'd be more than a one-liner,
> >> b) who knows which other subsystems are affected.
> >
> > I agree.
> >
> > I originally looked at changing this to be at device_add time, but I
> > think there are some code paths that do device_initialize and then do
> > some operations on the device before calling device_add.
>
> get_device() and put_device() seem to be about the only things that are
> interesting before device_add().
>
> Don't know if a final put_device() in this situation

Hm, that could be pretty simple to handle. I'd really like to force the
kobject itself to be dynamic, and inside the private portion of the
device structure. If I do that, then get_ and put_ would need to
allocate the object if it wasn't present. But that would mean that
get_device could sleep, which isn't the case today (put_device() can
always sleep, that's not an issue.)

> > But I could be
> > wrong, let me do some testing first before forcing you to make that big
> > change to the firewire core.
>
> It isn't actually that big. And the added complication could hopefully
> be covered by comments about the caveats.
>
> Actually, maybe it would be better for the firewire stack to move the
> concerned stuff into a non-atomic context. There are other things we do
> in there and atomic context isn't very comfortable for all this. But
> that would be a much bigger change.

Well, I'd always recommend doing things in non-atomic context wherever
possible, so I'll not object that hard to your patch either way :)

> >> Next, the above code is bogus. In 2.6.28, device_initialize() could
> >> never fail and was thus safe to use as a void-valued function.
> >>
> >> How does driver core handle dev->p == NULL in subsequent usages of dev now?
> >
> > It dies a flaming horrible death, pretty much like the whole rest of the
> > system if allocating such a small ammount of memory is causing failures
> > :)
>
> Well, at least code which allocates struct device can check for failure
> and handle it, while the allocator of dev->p can't even check. Unless
> you change device_initialize() to return error status and add error
> handling all over the place...

yeah, that would be a much bigger task than I'm really pondering,
although it probably is the correct thing to do...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/