Re: [netfilter bug] BUG: using smp_processor_id() in preemptible[00000000] code: ssh/9115, caller is ipt_do_table+0xc8/0x559

From: Paul E. McKenney
Date: Sat Apr 04 2009 - 13:23:25 EST


On Thu, Apr 02, 2009 at 11:16:06PM +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@xxxxxxx> wrote:
> > * Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:
> > > David put into its tree fix for that a few hours ago
> > >
> > > commit fa9a86ddc8ecd2830a5e773facc250f110300ae7
> > >
> > > (netfilter: iptables: lock free counters) forgot to disable BH
> > > in arpt_do_table(), ipt_do_table() and ip6t_do_table()
> > >
> > > Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.
> >
> > ok, got your fix (attached below), thanks Eric for the pointer.
> >
> > But i think my fix might be slightly better, because it does not
> > manipulate the preempt counter and leaves preemption enabled.
> >
> > There's no BH context worries since this code did not seem to have
> > BH protection before either. (it used a plain read_lock(), not
> > read_lock_bh(), AFAICS)
> >
> > I dont see any preemption worries either. I must be missing
> > something :)
>
> as per the other mail - what i missed was that the old code _did_
> use read_lock_bh(), which did not get carried over into the
> rcu_read_lock().
>
> So this fix affects basically all things netfilter, not just
> rcu-preempt - a plain rcu_read_lock() doesnt protect against BH
> context interaction.

Strangely enough, the original motivation for rcu_read_lock_bh() does not
apply to -rt kernels. The problem was that denial-of-service workloads
could apply such a heavy interrupt load to a given CPU that it never
got back to process-level execution, thus never passing through any
quiescent states.

So rcu-bh has softirq-level quiescent states, solving that problem,
but by disabling softirq (and thus preemption) across the read-side
critical sections.

But -rt has every point in the code not covered by rcu_read_lock()
as a quiescent state, so should not be vulnerable to that particular
denial-of-service attack. But rcu-bh has the additional semantic of
excluding BH execution while under rcu_read_lock_bh(), which appears
to be used in this case, and probably others as well.

Interesting corner we have painted ourselves into here...

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/