Re: dst_ifdown breaks infiniband?

From: Michael S. Tsirkin
Date: Sun Mar 18 2007 - 15:53:41 EST


Quoting Alexey Kuznetsov <kuznet@xxxxxxxxxxxxx>:
Subject: Re: dst_ifdown breaks infiniband?
>
> Hello!
>
> > This is not new code, and should have triggered long time ago,
> > so I am not sure how come we are triggering this only now,
> > but somehow this did not lead to crashes in 2.6.20
>
> I see. I guess this was plain luck.
>
>
> > Why is neighbour->dev changed here?
>
> It holds reference to device and prevents its destruction.
> If dst is held somewhere, we cannot destroy the device and deadlock
> while unregister.
>
> We could not invalidate dst->neighbour but it looked safe to invalidate
> neigh->dev after quiescent state. Obviosuly, it is not and it never was safe.
> Was supposed to be repaired asap, but this did not happen. :-(
>
> > Can dst->neighbour be changed to point to NULL instead, and the neighbour
> > released?
>
> It should be cleared and we should be sure it will not be destroyed
> before quiescent state.

I'm confused. didn't you say dst_ifdown is called after quiescent state?

> Seems, this is the only correct solution, but to do this we have
> to audit all the places where dst->neighbour is dereferenced for
> RCU safety.
>
> Actually, it is very good you caught this eventually, the bug was
> so _disgusting_ that it was "forgotten" all the time, waiting for
> someone who will point out that the king is naked. :-)
>
> Alexey

This does not sound like something that's likely to be accepted in 2.6.21, right?

Any simpler ideas?

--
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/