Something very strange on x86_64 2.6.X kernels

From: Eric Dumazet
Date: Thu Jan 20 2005 - 15:58:21 EST


Hi Andi

I have very strange coredumps happening on a big 64bits program.

Some background :
- This program is multi-threaded
- Machine is a dual Opteron 248 machine, 12GB ram.
- Kernel 2.6.6 (tried 2.6.10 too but problems too)
- The program uses hugetlb pages.
- The program uses prefetchnta
- The program uses about 8GB of ram.

After numerous differents core dumps of this program, and gdb debugging I found :

Every time the crash occurs when one thread is using some ram located at virtual address 0xffffe6xx

When examining the core image, the data saved on this page seems correct (ie countains coherent user data). But one register (%rbx) is usually corrupted and contains a small value (like 0x3c)

The last instruction using this register is :
prefetchnta 0x18(,%rbx,4)


Examining linux sources, I found that 0xffffe000 is 'special' (ia 32 vsyscall) and 0xffffe600 is about sigreturn subsection of this special area.

Is it possible some vm trick just kicks in and corrupts my true 64bits program ?

Thank you
Eric Dumazet
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/