No, the fastest, and shortest way to do it is
	.. dest in %edi ..
	movl $1024,%ecx
	xorl %eax,%eax
	rep; stosl
Which is in fact exactly how linux does it..
Of course, if you have an old x86 chip, that's your problem and you may 
not get optimal performance, but who expected anything else from old 
hardware?
(Hint: the above _really_ flies on a PPro. Intel optimized it to do 
cache-line accesses, it seems. They did the right thing)
		Linus