Read the fine print. I agree that my code-snipped above doesn't really
work, but the basic idea _does_ work.
Intel will not actually flush the pre-fetch queue on writes to a
prefetched location. Intel will flush the pre-fetch queue on writes to the
same _linear_address_ as the prefetched location, which is not the same
thing at all. It is very easy indeed to overcome: you just map the same
physical page at two different addresses, and you modify it at another
address than the one you execute from.
Boom.
This is something that can be used to fool any scheme that is based on
disassembling the instruction that caused the trap.
Linus