Re: [x86, vdso] BUG: unable to handle kernel paging request at d34bd000

From: Andy Lutomirski
Date: Sun Mar 09 2014 - 23:19:30 EST


On Sun, Mar 9, 2014 at 5:16 PM, H. Peter Anvin <hpa@xxxxxxxxxxxxxxx> wrote:
> On 03/09/2014 12:47 AM, Stefani Seibold wrote:
>>
>> But let me ask an other question: Is the compat mode still needed
>> anymore?
>>
>> Since Lguest, XEN, OPLC and the reservetop kernel parameter will change
>> the __FIXADDR_TOP, there is no fix place for the VDSO page. Also in the
>> 32 bit emulation layer the address is not fix.
>>
>> So all applications can fail when try directly access the VDSO page with
>> a hard coded address 0xffffe000.
>>
>> IMHO this is broken. So an other solution is to remove the whole VDSO
>> compat code.
>>
>
> Lguest, Xen, OLPC and reservetop are corner cases. My understanding is
> that at least one widely used distro actually cared about this, and
> Linus especially is adamant that "we don't break userspace."

OK, I did some research. I think that the commit that fixed the glibc bug was:

commit 49ad572a70b8aeb91e57483a11dd1b77e31c4468
Author: Ulrich Drepper <drepper@xxxxxxxxxx>
Date: Sat Feb 28 17:56:22 2004 +0000

Update.

* elf/rtld.c (dl_main): Adjust l->l_ld of the vDSO by l->l_addr.
* sysdeps/generic/dl-sysdep.c (_dl_sysdep_start): Only set
GL(dl_sysinfo) if non-zero.

I don't think that the actual load address of the VDSO matters at all.
Here's what I think is going on:

When the kernel is built, vdso32-int80.so looks like this (excerpted
from objdump -T):

DYNAMIC SYMBOL TABLE:
00000420 g DF .text 00000003 LINUX_2.5 __kernel_vsyscall
00000000 g DO *ABS* 00000000 LINUX_2.5 LINUX_2.5
00000410 g DF .text 00000008 LINUX_2.5 __kernel_rt_sigreturn
00000400 g DF .text 00000009 LINUX_2.5 __kernel_sigreturn

When the kernel is run, the kernel "relocates" the vdso, generating
something more like:

DYNAMIC SYMBOL TABLE:
ffffe420 g DF .text 00000014 LINUX_2.5 __kernel_vsyscall
00000000 g DO *ABS* 00000000 LINUX_2.5 LINUX_2.5
ffffe410 g DF .text 00000008 LINUX_2.5 __kernel_rt_sigreturn
ffffe400 g DF .text 00000009 LINUX_2.5 __kernel_sigreturn

That magic 0xffffe000 offset comes from VDSO_HIGH_BASE - VDSO_PRELINK,
and VDSO_PRELINK seems like an amazingly complicated way to say
"zero".

Before the fix, it looks like glibc couldn't handle a vdso that was
mapped in such a way that its ELF headers didn't match its actual
location. Now it can. This is borne out by this message:

commit d4f7a2c18e59e0304a1c733589ce14fc02fec1bd
Author: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Wed May 2 19:27:12 2007 +0200

[PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT

Some versions of libc can't deal with a VDSO which doesn't have its
ELF headers matching its mapped address. COMPAT_VDSO maps the VDSO at
a specific system-wide fixed address. Previously this was all done at
build time, on the grounds that the fixed VDSO address is always at
the top of the address space. However, a hypervisor may reserve some
of that address space, pushing the fixmap address down.

I suspect that it's entirely safe to map the 32-bit vdso wherever the
hell we want, so long as it's relocated to match the actual mapping
address. In principle it could even live outside the fixmap, as long
as the actual binary that gets run doesn't end up on top of it.

So... I propose that we get rid of all the madness. Fix the vdso32
setup code to stop being insane. That means: stop memcpying the vdso
image anywhere and get rid of all references to the magical and wrong
number "3". Just map it wherever it needs to be mapped and relocate
the damn think *in place*. If some RODATA crud gets in the way,
twiddle the protection bits as needed. That means that all this
"vvars before vdso" nonsense can go away.

(Of course, I haven't the faintest idea what l_addr in glibc means.
If there was a way to arrange for l_addr to be zero, then maybe none
of this would matter. Hmm, I wonder if just not relocating the vdso
at all would have the desired effect. Anyone out there understand
glibc?)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/