constraints and precious operand wastage

Michael L. Galbraith (mikeg@weiden.de)
Sun, 31 May 1998 17:09:29 +0200 (MET DST)


Hi All,

I can't produce a working system if I compile glibc with gcc-2.7.2.3,
and therefore have to use a new-breed compiler. I test these very
carefully and know for sure that 'bad' constraints do cause the
new-breed compilers to produce bad output. I also know that fixing
the constraints makes kernel/glibc/compilers a nice happy system.

Ergo, I've been 'fixing things' in my little test box.

For those who didn't read that thread, the constraint problem has been
defined as improper clobber list usage .. namely, it being illegal to
include an input in the clobber list. A method of properly clobbering
an input was demonstrated: dummy output to an unused variable.

The above mentioned method works, but IMHO is not (um) Wonderbread..

(1) it requires one additional operand, of the 10 maximum per
statement, to be used per clobbered input. This gets painful.

(2) it hurts readability of the code; it adds variables whose
purpose is to inform the compiler that a register is not
in a known state. These will always be optimized away.

I tried everything I could think of to come up with an alternate
solution, and finally arrived at something which looks nicer than
dummy outputs. Consider the following simple (minded?) approach.

--- checksum.c.org Sun May 31 13:09:16 1998
+++ checksum.c Sun May 31 13:08:16 1998
@@ -27,7 +27,6 @@
*/

unsigned int csum_partial(const unsigned char * buff, int len, unsigned int sum) {
-int dead1,dead2;
/*
* Experiments with ethernet and slip connections show that buff
* is aligned on either a 2-byte or 4-byte boundary. We get at
@@ -35,7 +34,7 @@
* Fortunately, it is easy to convert 2-byte alignment to 4-byte
* alignment for the unrolled loop.
*/
- __asm__("
+ __asm__ __volatile__("
testl $2, %%esi # Check alignment.
jz 2f # Jump if alignment is ok.
subl $2, %%ecx # Alignment uses up two bytes.
@@ -92,9 +91,9 @@
6: addl %%ecx,%%eax
adcl $0, %%eax
7: "
- : "=a"(sum), "=c" (dead1), "=S" (dead2)
- : "0"(sum), "1"(len), "2"(buff)
- : "bx", "dx");
+ : "=a"(sum)
+ : "0"(sum), "c"(len), "S"(buff));
+ __asm__ __volatile__("" :::"bx", "cx", "dx", "si");
return(sum);
}

@@ -237,8 +236,8 @@
: "=a" (sum)
: "m" (src_err_ptr), "m" (dst_err_ptr),
"0" (sum), "c" (len), "S" (src), "D" (dst),
- "i" (-EFAULT), "m"(tmp_var)
- : "bx", "cx", "dx", "si", "di" );
+ "i" (-EFAULT), "m"(tmp_var) );
+ __asm__ __volatile__("" :::"bx", "cx", "dx", "si", "di");

return(sum);
}

As you can see, I simply removed the clobbers and placed them in a
seperate statement beneath the functional statement, and insured
that these won't be moved by adding volatile.

The reasoning is that since gcc doesn't even see the body, it can't
know what that body contains, ergo the need for clobber list in the
first place. IFF a separate clobber list is supplied directly after
the functional statement, it must have the same effect without any
cost. (much asm reading|pretty heavily tested|machine still works:)

The only thing that I can see wrong with this method is the danger
of code relocation, which careful use of volatile should prevent.
Perhaps this could be overcome be encapsulating in a do{}while(0)?
Anything seriously dainbramaged about doing it this way?

The second constraint set is impossible to impliment using the dummy
output method without first breaking the statement into pieces, as it
would require 12 operands. Trying to avoid rewriting it inspired me
to try many imaginary solutions :))

Caio,

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu