Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text

From: Andi Kleen
Date: Thu Jan 10 2008 - 17:32:59 EST


On Thu, Jan 10, 2008 at 02:25:29PM -0800, Pallipadi, Venkatesh wrote:
>
>
> >-----Original Message-----
> >From: linux-kernel-owner@xxxxxxxxxxxxxxx
> >[mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] On Behalf Of Andi Kleen
> >Sent: Thursday, January 10, 2008 1:17 PM
> >To: Pallipadi, Venkatesh
> >Cc: Andi Kleen; ebiederm@xxxxxxxxxxxx; rdreier@xxxxxxxxx;
> >torvalds@xxxxxxxxxxxxxxxxxxxx; gregkh@xxxxxxx;
> >airlied@xxxxxxxxx; davej@xxxxxxxxxx; mingo@xxxxxxx;
> >tglx@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Siddha, Suresh B
> >Subject: Re: [patch 02/11] PAT x86: Map only usable memory in
> >x86_64 identity map and kernel text
> >
> >> I think it is unsafe to access any reserved areas through
> >"WB" not just
> >> mmio regions. In the above case 0xe0000000-0xf0000000 is one such
> >> region.
> >
> >That is 2MB aligned.
>
> That e820 also has a reserved here at 0x9d000.

That's not a hole

>
> BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
> BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
>
> If we keep mapping for such pages, it will be problematic as if a driver
> later does a ioremap, then we have to go through split-pages and cpa.

It should not do ioremap uncacheable from reserved because there
shouldn't be a MMIO hole in there. It can do ioremap_cachable()
but that is ok.

>
> Most of the holes/reserved areas will be 2M aligned, other than initial
> 2M and possible 2M around ACPI region. So, we may end up mapping some of
> those pages with small pages. Even though it was not enforced until now,
> I feel that is required for correctness.

If it's rare enough mapping in 2MB chunks around the holes is ok.
>
> >> >
> >> >Exactly it's already broken.
> >> >
> >> >Anyways if someone accesses mmio through /dev/mem I think they
> >> >definitely
> >> >want the real mappings, not a zero page. And dev/mem
> >should provide.
> >> >The trick is just to do it without caching attribute violations,
> >> >but with mattr it is possible.
> >>
> >> I don't like /dev/mem supporting access to mmio. We do not know what
> >
> >But it always did that. I'm sure you'll break stuff if you forbid
> >it suddenly.
> >
> >> attributes to use for these regions. We can potentially map
> >all these
> >> pages uncacheable.
> >
> >That is what current /dev/mem does.
>
> May be I am missing something. But, I don't think I saw /dev/mem
> checking whether some region is reserved and mapping those pages as
> uncacheable.

It relies partly on the MTRRs and partly checks for >= end_pfn.
Yes it's a gross hack, but it works.

> As I though, its mostly done as MTRR has such setting. If I
> do dd of devmem which ends up reading all reserved regions today, I see
> one of my systems dying horribly with NMI dazed and confused and the
> other gets SCSI errors etc. I am not sure how can some apps depend on
> reading mmio regions through /dev/mem. Any particular app you are
> thinking about?

The older X servers for once or x86emu in user space and likely various
others. There are all kind of scary utilities using /dev/mem around
(like BIOS flash updaters etc.)

I know some people who don't trust the VM for large memory ares
like to boot with small mem=... and then grab memory through /dev/mem.
I suspect if that didn't work anymore there would be eventually
complaints too although there might be a case be made for not
supporting that anymore.

BUt really /dev/mem is widely used and full compatibility is fairly
important.


> Other than the complicated code, do you see any issues of identity
> mapping only "usable" and "ACPI" regions as per e820? We can possible
> try to simplify the code, if that is the only concern.

The basic idea is fine.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/