Re: My system stops during startup with curretn git tree of 2.6.25-rc2

From: Rafael J. Wysocki
Date: Mon Feb 18 2008 - 11:42:42 EST


On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> Jiri Kosina wrote:
> > On Mon, 18 Feb 2008, Laszlo Attila Toth wrote:
> >
> >> Okay, but I can't figure out what's the problem with it. I don't have
> >> wireless card on my linux box also I can't test it but everything else
> >> works. Swap is mounted. The concurrency cannot be a problem because the
> >> write operation is protected by a lock.
> >
> > - write_lock_bh(&dev_base_lock);
> > - dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> > - write_unlock_bh(&dev_base_lock);
> > + if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
> > + write_lock_bh(&dev_base_lock);
> > + dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
> > + write_lock_bh(&dev_base_lock);
> > + modified = 1;
> > + }
> > }
> >
> > 1) you are accessing dev->link_mode and tb[] outside the dev_base_lock
>
> yes, because tb[IFLA_LINKMODE] is not used by someone else in this case
> only dev->link_mode. Although its value is unpredictable in case of a
> concurrent access in the condition, it does not affect the final value
> of dev->link_mode but the length of the critical section remains
> minimal. The if statement may be inside the lock.
>
> > 2) there is obvious and immediate deadlock -- you acquire the
> > dev_base_lock twice, without any unlock, just look at the chunk above
>
> Indeed:
> "Feb 16 16:51:49 sandman kernel: BUG: rwlock recursion on CPU#0,"
>
> I missed it. I copied the code from another patch which didn't contain
> the two locking statements and when I copied them back it became a
> copy-paste bug.
>
>
> > 3) even with this deadlock fixed, Rafael states that either NM or
> > wpa_supplicant (I don't recall from top of my head) still don't work
>
> That's bad. Does my suggestion solve the problem? Again:
>
> - if (modified)
> - netdev_state_change(dev);
> + if (modified && dev->flags & IFF_UP)
> + call_netdevice_notifiers(NETDEV_CHANGE, dev)

All in all, I gather you wanted me to test the patch below. :-)

Yes, that helps.

Thanks,
Rafael

---
Fix net/core/rtnetlink.c breakage caused by commit
45b503548210fe6f23e92b856421c2a3f05fd034
"[RTNETLINK]: Send a single notification on device state changes."

Signed-off-by: Rafael J. Wysocki <rjw@xxxxxxx>
---
net/core/rtnetlink.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/net/core/rtnetlink.c
===================================================================
--- linux-2.6.orig/net/core/rtnetlink.c
+++ linux-2.6/net/core/rtnetlink.c
@@ -853,7 +853,7 @@ static int do_setlink(struct net_device
if (dev->link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
write_lock_bh(&dev_base_lock);
dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
- write_lock_bh(&dev_base_lock);
+ write_unlock_bh(&dev_base_lock);
modified = 1;
}
}
@@ -870,8 +870,8 @@ errout:
if (send_addr_notify)
call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);

- if (modified)
- netdev_state_change(dev);
+ if (modified && dev->flags & IFF_UP)
+ call_netdevice_notifiers(NETDEV_CHANGE, dev);

return err;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/