E.g. because __writel_be(x,y) can be optimized much more than
__writel(cpu_to_be32(x),y)?
I mean it is a similar reason why there are cpu_to_be32p and cpu_to_be32.
The former is one instruction on sparc64 (the memory load with byte
swapping), the latter is a bunch of instructions (the load, several shifts
and several maskings). On sparc64 particularly will be __writel == writel
because ioremap (which is return physical address + magic; btw) will make
sure the side effect bit is set for the pages, but I guess other
architectures have similar byte swapping instructions.
One could argue that gcc could be tought to optimize sequences like
x = *(u32 *)addr;
x = ((__u32)( \
(((__u32)(x) & (__u32)0x000000ffUL) << 24) | \
(((__u32)(x) & (__u32)0x0000ff00UL) << 8) | \
(((__u32)(x) & (__u32)0x00ff0000UL) >> 8) | \
(((__u32)(x) & (__u32)0xff000000UL) >> 24) ));
into lduwa [addr] ASI_PL, x
but it won't be easy and I wonder if it is used much outside of kernel to be
really worth implementing.
Cheers,
Jakub
___________________________________________________________________
Jakub Jelinek | jj@sunsite.mff.cuni.cz | http://sunsite.mff.cuni.cz
Administrator of SunSITE Czech Republic, MFF, Charles University
___________________________________________________________________
UltraLinux | http://ultra.linux.cz/ | http://ultra.penguin.cz/
Linux version 2.3.13 on a sparc64 machine (1343.49 BogoMips)
___________________________________________________________________
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/