Re: [PATCH] relayfs redux, part 3 II - another comment

From: Andi Kleen
Date: Sat Feb 05 2005 - 05:00:22 EST


Another comment on relay_write fast pathing

> +static inline unsigned relay_write(struct rchan *chan,
> + const void *data,
> + unsigned length)
> +{
> + unsigned long flags;
> + struct rchan_buf *buf = relay_get_buf(chan, smp_processor_id());
> +
> + local_irq_save(flags);
> + if (unlikely(buf->offset + length > chan->subbuf_size))
> + length = relay_switch_subbuf(buf, length);
> + memcpy(buf->data + buf->offset, data, length);

I said earlier gcc would optimize the memcpy, but thinking about
this again I'm not so sure anymore. The problem is that with variable
buf->offset gcc cannot prove the alignment of the destination, and with
unknown alignment it tends to generate a out of line call to memcpy, which
is quite typically slow.

You can take a look at the generated assembly if that's true
or not, but I suspect it is.

To avoid this it may be needed to play

if (__builtin_constant_p(length))
switch (length) {
case 1: /* optimized version for 1 byte */ break;
case 2: ...
case 4: ...
case 8: ...
}
else
memcpy(...);

games like e.g. uaccess.h does. Problem is that sometimes gcc seems
to break __builtin_constant_p inside inlines, so it may be needed
to move it into a macro (i would double check if that is really needed
though for this case, the code is much nicer with a inline)

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/