Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU

From: Andrew Morton
Date: Mon Apr 13 2009 - 18:34:49 EST

Next message: Theodore Tso: "Re: [PATCH, RFC 0/3] Improvements to the tracing documentation"
Previous message: Ingo Molnar: "Re: [PATCH 1/3] tracing/filters: allow user-input to beinteger-like string"
In reply to: Linus Torvalds: "Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU"
Next in thread: Stephen Hemminger: "Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 13 Apr 2009 09:53:09 -0700
Stephen Hemminger <shemminger@xxxxxxxxxx> wrote:

> This is an alternative version of ip/ip6/arp tables locking using
> per-cpu locks. This avoids the overhead of synchronize_net() during
> update but still removes the expensive rwlock in earlier versions.
>
> The idea for this came from an earlier version done by Eric Duzamet.
> Locking is done per-cpu, the fast path locks on the current cpu
> and updates counters. The slow case involves acquiring the locks on
> all cpu's.
>
> The mutex that was added for 2.6.30 in xt_table is unnecessary since
> there already is a mutex for xt[af].mutex that is held.
>
> Tested basic functionality (add/remove/list), but don't have test cases
> for stress, ip6tables or arptables.
>
> unsigned int
> ipt_do_table(struct sk_buff *skb,
> @@ -339,9 +341,10 @@ ipt_do_table(struct sk_buff *skb,
>
> IP_NF_ASSERT(table->valid_hooks & (1 << hook));
>
> - rcu_read_lock_bh();
> - private = rcu_dereference(table->private);
> - table_base = rcu_dereference(private->entries[smp_processor_id()]);
> + local_bh_disable();
> + spin_lock(&__get_cpu_var(ip_tables_lock));

spin_lock_bh()?

> + private = table->private;
> + table_base = private->entries[smp_processor_id()];
>
> e = get_entry(table_base, private->hook_entry[hook]);
>
> @@ -436,8 +439,8 @@ ipt_do_table(struct sk_buff *skb,
> e = (void *)e + e->next_offset;
> }
> } while (!hotdrop);
> -
> - rcu_read_unlock_bh();
> + spin_unlock(&__get_cpu_var(ip_tables_lock));
> + local_bh_enable();
>
> #ifdef DEBUG_ALLOW_ALL
> return NF_ACCEPT;
> @@ -891,86 +894,34 @@ get_counters(const struct xt_table_info
> struct xt_counters counters[])
> {
> unsigned int cpu;
> - unsigned int i;
> - unsigned int curcpu;
> + unsigned int i = 0;
> + unsigned int curcpu = raw_smp_processor_id();
>
> /* Instead of clearing (by a previous call to memset())
> * the counters and using adds, we set the counters
> * with data used by 'current' CPU
> * We dont care about preemption here.
> */
> - curcpu = raw_smp_processor_id();
> -
> - i = 0;
> + spin_lock_bh(&per_cpu(ip_tables_lock, curcpu));
> IPT_ENTRY_ITERATE(t->entries[curcpu],
> t->size,
> set_entry_to_counter,
> counters,
> &i);
> + spin_unlock_bh(&per_cpu(ip_tables_lock, curcpu));
>
> for_each_possible_cpu(cpu) {
> if (cpu == curcpu)
> continue;
> +
> i = 0;
> + spin_lock_bh(&per_cpu(ip_tables_lock, cpu));
> IPT_ENTRY_ITERATE(t->entries[cpu],
> t->size,
> add_entry_to_counter,
> counters,
> &i);
> - }

This would be racy against cpu hotplug if this code was hotplug-aware.

And it should be hotplug aware, really. num_possible_cpus() can exceed
num_online_cpus(). The extent by which possible>online is
controversial, but one can conceive of situations where it is "lots".

Is lib/percpu_counter.c no good for this application? Unfixably no
good? That code automagically handles cpu hotplug.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Theodore Tso: "Re: [PATCH, RFC 0/3] Improvements to the tracing documentation"
Previous message: Ingo Molnar: "Re: [PATCH 1/3] tracing/filters: allow user-input to beinteger-like string"
In reply to: Linus Torvalds: "Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU"
Next in thread: Stephen Hemminger: "Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]