Re: 2.6.25-rc1 xen pvops regression

From: Jeremy Fitzhardinge
Date: Thu Feb 21 2008 - 18:17:25 EST


Eric W. Biederman wrote:
I thought the problem was a Xen-provided pagetable from before Linux started?

The immediate symptom was that we have a page table at the address we
are doing the DMI probe. Xen does not allow pages tables to be mapped
read-write so early_ioremap get into trouble.

Correct.

We have a mystery:
- Why did the Xen domain builder or the linux kernel use 0xf0000 - 0x10000
for a page table.

It didn't. Ian confirmed the allocation comes from the kernel's pagetable setup. The domain builder always puts pagetables after the kernel, so over 1M.

It should be possible to instrument the early linux page allocation
and see what page pages linux is using to see who is doing weird
things.

Linux. Which suggests a latent bug.

We have possible solutions.
- Add a read-only flag to early_ioremap for use by our table scans.
- Don't do a DMI scan in the case of Xen.
- Fix the Xen domain builder.

1 or 2. 3 is not a problem.

All of that said. For DMI tables other early tables we should not
be writing to them anyway so learning to use read-only maps may be
the right solution. If the reason Xen was complaining was that
we were accessing an area that was not page tables but instead
should only be mapped read-only I would have a lot of sympathy
with that. As mapping areas that are architecturally ROMs read-write
is silly.

For PV guests, Xen itself isn't really interested in any of the platform bios/hardware (not even acknowledging its existence, let alone emulate anything). Definitely no magic memory ranges or scanning for signatures. The in-kernel Xen support has made some concessions to that sort of stuff, by making sure various memory ranges are valid to be scanned, even if they don't contain anything meaningful. We could do something like that for DMI if that turns out to be the best answer.

So guys can you please finish the root cause and really see why there
is a page table page at in 64K ROM BIOS area?

It seems to me that those pages are being handed out as heap pages by the early allocator. In the Xen case this is OK because there's nothing magic about them. But if real hardware doesn't reserve these pages in the E820 map, then they could end up being used as regular memory by mistake, which is an issue.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/