Re: Sysfs and suicidal attributes

From: Tejun Heo
Date: Tue Jul 10 2007 - 01:09:58 EST


Hello, Alan.

Alan Stern wrote:
> You remember the problem we faced with "suicidal" sysfs attributes and
> the device_schedule_callback() routine that fixes it?

Sure.

> Well, it turns out that this approach may confict with suspend/resume
> processing. In brief, it's not a good idea to unregister devices while
> a suspend is in progress, but on the other hand it's not a good idea to
> block keventd waiting until the suspend is over.

Is it because keventd isn't frozen during suspend? I'm afraid we might
be trying to solve the problem at the wrong layer. Device
addition/removal should be _locked out_ during suspend/resume.
Depending on executing thread being frozen is more subtle and fragile
and freezing might go away altogether.

IMHO, what's needed is a giant rwlock protecting the whole device
addition/removal while suspend is in progress (addition/removal during
resume should be safe as long as we put the new devices on the already
processed list). The lock can probably replace dpm_sem and dpm_list_sem.

It's probably necessary to use trylock from addition/removal to avoid
deadlock. Drivers which support hotplugging need to rescan/revalidate
the bus during resume anyway so failing addition/removal during suspend
shouldn't be a problem.

> I haven't paid much attention to your sysfs updates. Is it still true
> that calls to show/store methods are mutually exclusive with attribute
> unregistration? Assuming the answer is Yes, would it be possible to
> bypass that mutual exclusion in certain explicit cases? Here's what I
> have in mind:
>
> The user writes to an attribute file.
>
> The sysfs core calls the attribute's store method.
>
> The method tells the sysfs core to pretend that the call
> temporarily doesn't exist, or has completed, or something
> like that.
>
> The method safely unregisters the attribute file, with no
> mutual exclusion problems and no deadlock. Of course, the
> unregistration will still block until all _other_ method
> calls for this attribute are complete.
>
> The method tells the sysfs core to stop pretending and
> go back to its normal state.
>
> The method returns, and the sysfs core takes whatever actions
> are needed to fully release the attribute file.
>
> The idea is that there could be a way to allow unregistration while a
> method is still running, if the method specifically requests it. If we
> could do this then device_schedule_callback() would be unnecessary.

Regardless of the suspend problem, I like the above idea. Actually, I
was thinking about doing exactly that for suicidal nodes when I first
found out the problem. All that's needed is an owner field which allows
the owning thread to reenter (what was the name for this type of mutex?
BSD folks like it). It's probably better to put some restrictions to
allow only the specific case tho.

I like it because it shifts complexity from the drivers into driver
core. IOW, the driver model is kinder to drivers that way - the driver
writer doesn't have to care whether something is suicidal or not - and I
think that's the way we should be headed although we're not good at it yet.

Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/