Re: BUG at net/sunrpc/svc_xprt.c:921

From: Stanislav Kinsbursky
Date: Thu Jan 17 2013 - 08:24:13 EST


17.01.2013 17:03, J. Bruce Fields ÐÐÑÐÑ:
On Thu, Jan 17, 2013 at 09:05:51AM +0400, Stanislav Kinsbursky wrote:
17.01.2013 02:51, Mark Lord ÐÐÑÐÑ:
On 13-01-16 12:20 AM, Stanislav Kinsbursky wrote:

Mark, could you provide any call traces?

Call traces from where/what?
There's this one, posted earlier in the BUG report:

kernel BUG at net/sunrpc/svc_xprt.c:921!
Call Trace:
[<ffffffffa016a56a>] ? svc_recv+0xcc/0x338 [sunrpc]
[<ffffffffa0318bfc>] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
[<ffffffffa0318c19>] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
[<ffffffff810407e6>] ? kthread+0x81/0x89
[<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
[<ffffffff812ea62c>] ? ret_from_fork+0x7c/0xb0
[<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36


Thanks!
I haven't seen the bug report.
Could you provide the link, please?

There's no bz if that's what you're asking for.

See the first message in the thread for the original report:

http://mid.gmane.org/<50F42F85.50907@xxxxxxxxxxxx>


Thanks, Bruce.
This looks like the old issue I was trying to fix with "SUNRPC: protect service sockets lists during per-net shutdown".
So, here is the problem as I see it: there is a transport, which is processed by service thread and it's processing is racing with per-net service shutdown:

CPU#0: CPU#1:

svc_recv svc_close_net
svc_get_next_xprt (list_del_init(xpt_ready))
svc_close_list (set XPT_BUSY and XPT_CLOSE)
svc_clear_pools(xprt was gained on CPU#0 already)
svc_delete_xprt (set XPT_DEAD)
svc_handle_xprt (is XPT_CLOSE => svc_delete_xprt()
BUG()

So, from my POW, we need some way to:
1) Skip such in-progress transports on svc_close_net() call (there is not way to detect them, or at least I don't see one)
2) Delete the transport after somewhere after svc_xprt_received()

But there is a problem with svc_xprt_received(): there is a call for svc_xprt_put() in it (svc_recv->svc_handle_xprt->svc_xprt_received->svc_xprt_put) . And if we are the only user - then the transport will be destroyed. But transport is dereferenced later in svc_recv() after the svc_handle_xprt call.

What do you think, Bruce?


--b.



--
Best regards,
Stanislav Kinsbursky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/