Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs

From: Matthew Wilcox
Date: Wed Jun 17 2020 - 20:52:33 EST


On Wed, Jun 17, 2020 at 04:46:09PM -0700, Paul E. McKenney wrote:
> > + // Handle two first channels.
> > + for (i = 0; i < FREE_N_CHANNELS; i++) {
> > + for (; bkvhead[i]; bkvhead[i] = bnext) {
> > + bnext = bkvhead[i]->next;
> > + debug_rcu_bhead_unqueue(bkvhead[i]);
> > +
> > + rcu_lock_acquire(&rcu_callback_map);
> > + if (i == 0) { // kmalloc() / kfree().
> > + trace_rcu_invoke_kfree_bulk_callback(
> > + rcu_state.name, bkvhead[i]->nr_records,
> > + bkvhead[i]->records);
> > +
> > + kfree_bulk(bkvhead[i]->nr_records,
> > + bkvhead[i]->records);
> > + } else { // vmalloc() / vfree().
> > + for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > + trace_rcu_invoke_kfree_callback(
> > + rcu_state.name,
> > + bkvhead[i]->records[j], 0);
> > +
> > + vfree(bkvhead[i]->records[j]);
> > + }
> > + }
> > + rcu_lock_release(&rcu_callback_map);
>
> Not an emergency, but did you look into replacing this "if" statement
> with an array of pointers to functions implementing the legs of the
> "if" statement? If nothing else, this would greatly reduced indentation.

I don't think that replacing direct function calls with indirect function
calls is a great suggestion with the current state of play around branch
prediction.

I'd suggest:

rcu_lock_acquire(&rcu_callback_map);
trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
bkvhead[i]->nr_records, bkvhead[i]->records);
if (i == 0) {
kfree_bulk(bkvhead[i]->nr_records,
bkvhead[i]->records);
} else {
for (j = 0; j < bkvhead[i]->nr_records; j++) {
vfree(bkvhead[i]->records[j]);
}
}
rcu_lock_release(&rcu_callback_map);

But I'd also suggest a vfree_bulk be added. There are a few things
which would be better done in bulk as part of the vfree process
(we batch them up already, but i'm sure we could do better).