Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

From: David Fries
Date: Wed Mar 11 2015 - 20:44:59 EST


On Tue, Mar 10, 2015 at 01:34:14AM +0100, Thorsten Bschorr wrote:
> > It looks like your patch runs into dead locks problems:

I was testing reading and removing the slave, I didn't test two
readers, I'll keep that in mind.

For some reason I was thinking it was a try lock after the sleep, it
isn't, a signal can cause it to return a value, but yes that's not
going to work. I have another version that uses an atomic counter in
place of this mutex, it requires a loop in the remove slave.

I have a one wire network with 14 temperature sensors on it. What I
do is I have a program that talks to the kernel over netlink, which is
now non-blocking for one wire after an earlier set of changes I made.
It sends out all the conversion request requests, waits, sends the
read request packets, then gets back the results, and relays the
results to the program that requested them. It doesn't use the
w1_therm and avoids all these problems. If I read them sequentially
with w1_therm, that would take more than 14*.750 or 10.5 seconds,
there's no way with w1_therm to read them at once without having 14
threads or programs doing so, the problem is it takes a lot more
programming to do netlink.

> > I have a cron job reading my sensors. If I read the sensors on another
> > thread (e.g. via cat), the 2nd thread can produce a dead lock:
> >
> > * thread 1 has bus & sl lock
> > * thread 1 drops bus lock, keeps sl locks and sleeps
> > * thread 2 get bus lock, waits for sl lock
> > * thread 1 returns from sleep, waits for bus lock, but this is help by thread 2
>
>
> Aquiring sl lock before the bus lock prevents this dead lock (no
> change of locking-order).
> With search enabled, removing a device by w1_search_process_cb then
> also worked as intended:
>
> Mar 10 01:29:04 pi kernel: [ 924.870893] w1_therm_remove_slave about
> to lock faily_data->lock
> Mar 10 01:29:04 pi kernel: [ 924.870935] w1_therm_remove_slave
> unlocked faily_data->lock
>
> Mar 10 01:29:04 pi kernel: [ 924.871115] w1_therm_remove_slave about
> to lock faily_data->lock
> Mar 10 01:29:05 pi kernel: [ 925.151242] start sleep d117a600 refcnt 1 **
> Mar 10 01:29:05 pi kernel: [ 925.151277] end sleep d117a600 refcnt 0 **
> Mar 10 01:29:11 pi kernel: [ 931.295344] w1_slave_driver
> 10-000802cc045a: Read failed CRC check
> Mar 10 01:29:11 pi kernel: [ 931.295437] w1_therm_remove_slave
> unlocked faily_data->lock
> ** I get the ref-cnt before and after the sleep and only log if they differ.
>
>
> But I am unable to judge if mobing the sl lock at the beginning of
> w1_slave_show can cause dead locks in other scenarios.

I'm not sure, I would probably switch back to the referencing counting
version I wrote earlier, or make the bus mutex lock a timed lock or
try lock first.

--
David Fries <david@xxxxxxxxx> PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/