Re: Process Hang in __read_seqcount_begin

From: Eric Dumazet
Date: Wed Oct 24 2012 - 00:32:38 EST


On Tue, 2012-10-23 at 17:15 -0700, Peter LaDow wrote:
> (Sorry for the subject change, but I wanted to try and pull in those
> who work on RT issues, and the subject didn't make that obvious.
> Please search for the same subject without the RT Linux trailing
> text.)
>
> Well, more information. Even with SMP enabled (and presumably the
> migrate_enable having calls to preempt_disable), we still got the
> lockup in iptables-restore. I did more digging, and it looks like the
> complete stack trace includes a call from iptables-restore (through
> setsockopt with IPT_SO_SET_ADD_COUNTERS). This seems to be a
> potential multiple writer case where the counters are updated through
> the syscall and the kernel is updating the counters as it filters
> packets.
>
> I think there might be a race on the update to xt_recseq.sequence,
> since the RT patches remove the spinlock in seqlock_t. Thus multiple
> writers can corrupt the sequence count.



> And I thought the SMP
> configuration would disable preemption when local_bh_disable() is
> called. And indeed, looking at the disassembly, I see
> preempt_disable() (though optimized, goes to add_preempt_count() ) is
> being called.
>
> Yet we still see the lockup in the get_counters() in iptables. Which,
> it seems, is because of some sort of problem with the sequence. It
> doesn't appear to be related to the preemption, and perhaps there is
> some other corruption of the sequence counter happening. But the only
> places I see it modified is in xt_write_recseq_begin and
> xt_write_recseq_end, which is only in the netfilter code
> (ip6_tables.c, ip_tables.c, and arp_tables.c). And every single call
> is preceeded by a call to local_bh_disable().
>
> This problem is a huge one for us. And so far I'm unable to track
> down how this is occurring.
>
> Any other tips? I presume this is the proper place for RT issues.

Hmm...

Could you try following patch ?

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 8d674a7..053f9e7 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -471,30 +471,24 @@ DECLARE_PER_CPU(seqcount_t, xt_recseq);
*
* Begin packet processing : all readers must wait the end
* 1) Must be called with preemption disabled
- * 2) softirqs must be disabled too (or we should use this_cpu_add())
* Returns :
* 1 if no recursion on this cpu
* 0 if recursion detected
*/
static inline unsigned int xt_write_recseq_begin(void)
{
- unsigned int addend;
-
+ unsigned int old, new;
+ seqcount_t *ptr = &__get_cpu_var(xt_recseq);
/*
* Low order bit of sequence is set if we already
* called xt_write_recseq_begin().
*/
- addend = (__this_cpu_read(xt_recseq.sequence) + 1) & 1;
-
- /*
- * This is kind of a write_seqcount_begin(), but addend is 0 or 1
- * We dont check addend value to avoid a test and conditional jump,
- * since addend is most likely 1
- */
- __this_cpu_add(xt_recseq.sequence, addend);
- smp_wmb();
-
- return addend;
+ old = ptr->sequence;
+ new = old | 1;
+ /* Note : cmpxchg() is a memory barrier, we dont need smp_wmb() */
+ if (old != new && cmpxchg(&ptr->sequence, old, new) == old)
+ return 1;
+ return 0;
}

/**


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/