Re: [PATCH] x86: create a non-zero sized bm_pte only when needed

From: Jeremy Fitzhardinge
Date: Tue Mar 17 2009 - 14:33:44 EST


Jan Beulich wrote:
Does this depend on CONFIG_EARLY_PRINTK_DBGP being set? And what's so special about FIX_DBGP_BASE, that we should hard-code it in here? Is it just that its the first non-arch-dependent fixmap slot? Or something else? Will it break if we move FIX_DBGP_BASE?

No, it is indeed tied to that one fixmap entry, as this is what the 'early
initialization of the fixmap area' (commented such in head_32.S, and
uncommented equivalent exists in head_64.S) is about, albeit without
explicit tying to the respective fixmap entry (which makes this code
even more fragile than my change might seem).

I had to dig back to mid 2007 to find Eric's changeset referring to "USB debug", but as far as I can see the DBGP code didn't appear in-tree until mid 2008. That's a lot of archaeology to dig through to understand this change.

Is the space saving here just the 1 page for bm_pte[]?

Yes.

Wouldn't we do as well by making it initdata?

No, because the table may be retained past boot.

No, early_ioremaps are not allowed to exist beyond boot-time. The ioremap code will complain about any extant mappings in check_early_ioremap_leak().

But bm_pte might be used for other fixmap slots, so it can't really be released unless we carefully rearrange things.

I'm picking on this change because its breaking Xen PV booting...

Hmm, I don't think there's anything that should make it break. Any
details?

dmi_present() faults because the pointer returned from early_ioremap is bad: the L2 entry is empty. Xen boot doesn't go through head.S, and has no particular requirement for extremely early fixmap access, so there's no implicit fixmap pte setup.

user-defined physical RAM map:
user: 0000000000000000 - 00000000000a0000 (usable)
user: 00000000000a0000 - 0000000000100000 (reserved)
user: 0000000000100000 - 0000000000eaf000 (usable)
user: 0000000000eaf000 - 0000000000f32000 (reserved)
user: 0000000000f32000 - 0000000010000000 (usable)
(XEN) d1:v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffffffff400000:
(XEN) L4[0x1ff] = 0000000078c80067 0000000000000203
(XEN) L3[0x1ff] = 0000000078c41067 0000000000000204
(XEN) L2[0x1fa] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S
(XEN) Domain 1 (vcpu#0) crashed on cpu#1:
(XEN) ----[ Xen-3.4-unstable x86_64 debug=y Not tainted ]----
(XEN) CPU: 1
(XEN) RIP: e033:[<ffffffff8034e155>]
(XEN) RFLAGS: 0000000000000207 EM: 1 CONTEXT: pv guest
(XEN) rax: ffffffffff410000 rbx: ffffffff80665e88 rcx: 0000000000000003
(XEN) rdx: 000000000000000f rsi: ffffffffff400000 rdi: ffffffff80665e88
(XEN) rbp: ffffffff80665e78 rsp: ffffffff80665e30 r8: ffffffff80665c68
(XEN) r9: ffffffff80621080 r10: 0000000000000002 r11: 0000000000000519
(XEN) r12: ffffffffff400000 r13: ffffffffff400000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0
(XEN) cr3: 0000000078e01000 cr2: ffffffffff400000



static __initdata int after_paging_init;
-static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
+#define __FIXADDR_TOP (-PAGE_SIZE)
Will this break in a 32-bit PV kernel where FIXADDR_TOP is shifted?

Not as long as the shifting happens in 2Mb steps (and when I wrote the
patch [which was a while back] I checked that there are other assumptions
about the shift only happening in 2Mb increments).

This seriously needs a good inline comment.

Why only is it always me who is asked for extensive inline comments, when
other code in the same area is happily being accepted without even being
self-commenting (which, if you read the construct carefully, I believe my
change is)? As noted above, the dependency on which page table slot
need early initialization is completely hidden behind hardcoded literal numbers
at least for x86-64. This is what indeed would need a comment (or better
yet, replacing of the hardcoded numbers by proper symbolics, in which
case I would think a comment would quickly become redundant).

The change needs to stand on its own. I'm fairly familiar with this area of this area of code, but the implicit dependency on the fixmap setup in head*.S wasn't at all obvious. I searched the tree for references to FIX_DBGP_BASE to work out why it was the slot you were using here, but again, I couldn't work it out. And conceptually, making the init of something as central as early_ioremap a side-effect of the init for a debug device just doesn't make much sense to me.

My concern is that this change makes everything a bit more brittle, with more non-obvious long-range dependencies which we'll need to keep under control as the code changes, but with only a tiny (potential) memory saving as an upside.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/