Re: [PATCH] Percpu tag allocator

From: Oleg Nesterov
Date: Wed Jun 12 2013 - 15:18:52 EST


On 06/12, Kent Overstreet wrote:
>
> So we'd need at least an atomic counter, but a bitmap isn't really any
> more trouble and it lets us skip most of the percpu lists that are empty

Yes, yes, I understand.

> - which should make a real difference in scalability to huge nr_cpus.

But this is not obvious to me. I mean, I am not sure I understand why
this all is "optimal". In particular, I do not really understand
"while (cpus_have_tags-- * TAG_CPU_SIZE > pool->nr_tags / 2)" in
steal_tags(), even if "the workload should not span more cpus than
nr_tags / 128" is true. I guess this connects to "we guarantee that
nr_tags / 2 can always be allocated" and we do not want to call
steal_tags() too often/otherwise, but cpus_have_tags * TAG_CPU_SIZE
can easily overestimate the number of free tags.

But I didn't read the patch carefully, and it is not that I think I
can suggest something better.

In short: please ignore ;)

> > > +enum {
> > > + TAG_FAIL = -1U,
> > > + TAG_MAX = TAG_FAIL - 1,
> > > +};
> >
> > This can probably go to .c, and it seems that TAG_MAX is not needed.
> > tag_init() can check "nr_tags >= TAG_FAIL" instead.
>
> Users need TAG_FAIL to check for allocation failure.

Ah, indeed, !__GFP_WAIT...

Hmm. but why gfp_t? why not "bool atomic" ?

Probably this is because you expect that most callers should have
gfp anyway. Looks a bit strange but I won't argue.

> > > + if (nr_free) {
> > > + memcpy(tags->freelist,
> > > + remote->freelist,
> > > + sizeof(unsigned) * nr_free);
> > > + smp_mb();
> > > + remote->nr_free = 0;
> > > + tags->nr_free = nr_free;
> > > + return true;
> > > + } else {
> > > + remote->nr_free = 0;
> > > + }
> >
> > Both branches clear remote->nr_free.
>
> Yeah, but clearing it has to happen after the memcpy() and the smp_mb().

Yes, yes, we need mb() between memcpy() and "remote = 0",

> I couldn't find a way to combine them that I liked, but if you've got
> any suggestions I'm all ears.

Please ignore. Somehow I missed the fact we need to return or continue,
so we need "else" or another check.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/