Re: Process Hang in __read_seqcount_begin

From: Eric Dumazet
Date: Mon Oct 22 2012 - 13:01:34 EST


On Mon, 2012-10-22 at 09:46 -0700, Peter LaDow wrote:
> I posted this problem some time back on the linux-rt-users and
> netfilter lists. Since then, we thought we had a workaround to avoid
> this problem, so we dropped the issue. But now 5 months later, the
> problem has reappeared. And this time it is much more serious and
> much more difficult to re-create. After perusing both those lists,
> I'm not sure if those were the proper places to post. The netfilter
> list seems to be more focused on the user space side of things, and
> the RT page indicates that kernel side RT issues should go to lkml.
>
> Anyway, here's a repost of that problem from July. Perhaps somebody
> here can point us in the right direction.
>
> We are running 3.0.36-rt57 on a powerpc box. During some testing with
> heavy loads and interfaces coming up/going down (specifically PPP), we
> have run into a case where iptables hangs and cannot be killed. It
> requires a reboot to fix the problem.
>
> Connecting the BDI and debugging the kernel, we get:
>
> #0 get_counters (t=0xdd5145a0, counters=0xe3458000)
> at include/linux/seqlock.h:66
> #1 0xc026b4ac in do_ipt_get_ctl (sk=<value optimized out>,
> cmd=<value optimized out>, user=0x10612078, len=<value optimized out>)
> at net/ipv4/netfilter/ip_tables.c:918
> #2 0xc022226c in nf_sockopt (sk=<value optimized out>, pf=2 '\002',
> val=<value optimized out>, opt=<value optimized out>, len=0xdd4c7d4c,
> get=1) at net/netfilter/nf_sockopt.c:109
> #3 0xc0236b1c in ip_getsockopt (sk=0xdf071480, level=<value optimized out>,
> optname=65, optval=0x10612078 <Address 0x10612078 out of bounds>,
> optlen=0xbfbe0c2c) at net/ipv4/ip_sockglue.c:1308
> #4 0xc02522a8 in raw_getsockopt (sk=0xdf071480, level=<value optimized out>,
> optname=<value optimized out>, optval=<value optimized out>,
> optlen=<value optimized out>) at net/ipv4/raw.c:811
> #5 0xc01f4c38 in sock_common_getsockopt (sock=<value optimized out>,
> level=<value optimized out>, optname=<value optimized out>,
> optval=<value optimized out>, optlen=<value optimized out>)
> at net/core/sock.c:2157
> #6 0xc01f2df8 in sys_getsockopt (fd=<value optimized out>, level=0,
> optname=65, optval=0x10612078 <Address 0x10612078 out of bounds>,
> optlen=0xbfbe0c2c) at net/socket.c:1839
> #7 0xc01f45b4 in sys_socketcall (call=15, args=<value optimized out>)
> at net/socket.c:2421
>
> It seems to be stuck in __read_seqcount_begin. From include/linux/seqlock.h:
>
> static inline unsigned __read_seqcount_begin(const seqcount_t *s)
> {
> unsigned ret;
>
> repeat:
> ret = ACCESS_ONCE(s->sequence);
> if (unlikely(ret & 1)) {
> cpu_relax();
> <----- It is always here
> goto repeat;
> }
> return ret;
> }
>
> I've been scouring the mailing lists and Google searches trying to
> find something, but thus far have come up with nothing.
>
> Any tips would be appreciated.

This looks like a corruption of s->sequence, and is value is odd, even
if no writer is alive.

Does local_bh_disable() disables preemption on RT ?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/