Re: Process Hang in __read_seqcount_begin

From: Peter LaDow
Date: Fri Oct 26 2012 - 12:15:39 EST


On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> Could you try following patch ?

So, I applied your patch. And so far, it seems to have fixed the
issue. I've had my systems running for 48 hours, and no lockup in
iptables. Usually, I could get a lockup to occur within 12 to 24
hours, and running this long tells me this patch may have things
fixed.

Now, I have a couple of questions.

First, I'm not sure how this actually fixes things. It does seem
there was a race before. But it isn't clear to me how this patch
eliminates the race. I fear that this patch only reduces the window
in which a race could occur, but doesn't eliminate it completely.

Second, perhaps use seqlock_t instead of a seqcount_t for xt_recseq?
This eliminates the problem. But given that it has been using
seqcount_t for a long time, and is still using it, and nobody else has
had this issue appear, makes me wonder if this problem isn't something
unique to the RT patches. Perhaps only use seqlock_t in built for
PREEMPT_RT_FULL?

Third, recent RT patches (such as 3.6.2-rt4) have added
preempt_disable_rt() and preempt_enable_rt() calls inside of
read_seqcount_begin() and write_seqcount_end() respectively. The call
to local_bh_disable/local_bh_enable doesn't do anything to
disable/enable preemption (in 3.6.2-rt4), but it does in 3.03.36-rt58.
But in 3.0.36-rt58 it only did so if not in an atomic context. And
it doesn't appear to be an atomic context since local_bh_disable
increments preempt_count by SOFTIRQ_OFFSET. And it appeared that
building with SMP enabled, even though it did call
preempt_disable_rt(), indirectly, through local_bh_disable(), but the
lockup still occurred.

Finally, after more testing, if this patch proves a solution to the
problem, we could apply it locally. But what kind of testing would be
required as part of a submission back to the general kernel/RT folk?
The patch is easy enough to generate, but if it can't be proven that
this patch actually fixes anything, I fail to see how it would be
useful.

Thanks,
Pete LaDow
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/