Re: [PATCH 1/3] x86-32: Separate 1:1 pagetables from swapper_pg_dir

From: Konrad Rzeszutek Wilk
Date: Tue Jan 18 2011 - 19:41:24 EST


On Thu, Nov 11, 2010 at 02:56:13PM +0100, Joerg Roedel wrote:
> This patch fixes machine crashes which occur when heavily exercising the
> CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
> AMD Erratum 383 and result in a fatal machine check exception. Here's
> the scenario:
>
> 1. On 32-bit, the swapper_pg_dir page table is used as the initial page
> table for booting a secondary CPU.
>
> 2. To make this work, swapper_pg_dir needs a direct mapping of physical
> memory in it (the low mappings). By adding those low, large page (2M)
> mappings (PAE kernel), we create the necessary conditions for Erratum
> 383 to occur.
>
> 3. Other CPUs which do not participate in the off- and onlining game may
> use swapper_pg_dir while the low mappings are present (when leave_mm is
> called). For all steps below, the CPU referred to is a CPU that is using
> swapper_pg_dir, and not the CPU which is being onlined.
>
> 4. The presence of the low mappings in swapper_pg_dir can result
> in TLB entries for addresses below __PAGE_OFFSET to be established
> speculatively. These TLB entries are marked global and large.
>
> 5. When the CPU with such TLB entry switches to another page table, this
> TLB entry remains because it is global.
>
> 6. The process then generates an access to an address covered by the
> above TLB entry but there is a permission mismatch - the TLB entry
> covers a large global page not accessible to userspace.
>
> 7. Due to this permission mismatch a new 4kb, user TLB entry gets
> established. Further, Erratum 383 provides for a small window of time
> where both TLB entries are present. This results in an uncorrectable
> machine check exception signalling a TLB multimatch which panics the
> machine.
>
> There are two ways to fix this issue:
>
> 1. Always do a global TLB flush when a new cr3 is loaded and the
> old page table was swapper_pg_dir. I consider this a hack hard
> to understand and with performance implications
>
> 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
> does.
>
> This patch implements solution 2. It introduces a trampoline_pg_dir
> which has the same layout as swapper_pg_dir with low_mappings. This page
> table is used as the initial page table of the booting CPU. Later in the
> bringup process, it switches to swapper_pg_dir and does a global TLB
> flush. This fixes the crashes in our test cases.
>
> -v2: switch to swapper_pg_dir right after entering start_secondary() so
> that we are able to access percpu data which might not be mapped in the
> trampoline page table.

You also might want to look at the regression this patch caused when it
was introduced. Mainly this fix:
805e3f495057aa5307ad4e3d6dc7073d4733c691
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/