Re: [PATCH] i386: fix vmalloc_sync_all() for Xen

From: Jeremy Fitzhardinge
Date: Wed Jun 18 2008 - 16:02:32 EST


Jan Beulich wrote:
Since the fourth PDPT entry cannot be shared under Xen,
vmalloc_sync_all() must iterate over pmd-s rather than pgd-s here.
Luckily, the code isn't used for native PAE (SHARED_KERNEL_PMD is 1)
and the change is benign to non-PAE.

Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Signed-off-by: Jan Beulich <jbeulich@xxxxxxxxxx>

---
arch/x86/mm/fault.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)

--- linux-2.6.26-rc6/arch/x86/mm/fault.c 2008-06-18 09:56:16.000000000 +0200
+++ 2.6.26-rc6-i386-xen-vmalloc_sync_all/arch/x86/mm/fault.c 2008-06-06 08:51:52.000000000 +0200
@@ -921,32 +921,43 @@ void vmalloc_sync_all(void)
* start are only improving performance (without affecting correctness
* if undone).
*/
- static DECLARE_BITMAP(insync, PTRS_PER_PGD);
+#define sync_index(a) ((a) >> PMD_SHIFT)
+ static DECLARE_BITMAP(insync, PTRS_PER_PGD*PTRS_PER_PMD);

Given that the usermode PGDs will never need syncing, I think it would be better to use KERNEL_PGD_PTRS, and define

#define sync_index(a) (((a) >> PMD_SHIFT) - KERNEL_PGD_BOUNDARY)

for a massive 192 byte saving in bss.

static unsigned long start = TASK_SIZE;
unsigned long address;
if (SHARED_KERNEL_PMD)
return;
- BUILD_BUG_ON(TASK_SIZE & ~PGDIR_MASK);
- for (address = start; address >= TASK_SIZE; address += PGDIR_SIZE) {
- if (!test_bit(pgd_index(address), insync)) {
+ BUILD_BUG_ON(TASK_SIZE & ~PMD_MASK);
+ for (address = start; address >= TASK_SIZE; address += PMD_SIZE) {

Would it be better - especially for the Xen case - to only iterate from TASK_SIZE to FIXADDR_TOP rather than wrapping around? What will vmalloc_sync_one do on Xen mappings?

+ if (!test_bit(sync_index(address), insync)) {
It's probably worth reversing this test and removing a layer of indentation.
unsigned long flags;
struct page *page;
spin_lock_irqsave(&pgd_lock, flags);
+ if (unlikely(list_empty(&pgd_list))) {
+ spin_unlock_irqrestore(&pgd_lock, flags);
+ return;
+ }

This seems a bit warty. If the list is empty, then won't the list_for_each_entry() just fall through? Presumably this only applies to boot, since pgd_list won't be empty on a running system with usermode processes. Is there a correctness issue here, or is it just a micro-optimisation?

list_for_each_entry(page, &pgd_list, lru) {
if (!vmalloc_sync_one(page_address(page),
- address))
+ address)) {
+ BUG_ON(list_first_entry(&pgd_list,
+ struct page,
+ lru) != page);

What condition is this testing for?

+ page = NULL;
break;
+ }
}
spin_unlock_irqrestore(&pgd_lock, flags);
- if (!page)
- set_bit(pgd_index(address), insync);
+ if (page)
+ set_bit(sync_index(address), insync);
}
- if (address == start && test_bit(pgd_index(address), insync))
- start = address + PGDIR_SIZE;
+ if (address == start && test_bit(sync_index(address), insync))
+ start = address + PMD_SIZE;
}
+#undef sync_index
#else /* CONFIG_X86_64 */
/*
* Note that races in the updates of insync and start aren't

Any chance of unifying this with the very similar-looking loop below it?

(I have to admit I don't understand why 64-bit needs to worry about syncing stuff. Doesn't it have enough pgds to go around? Is it because it wants to put modules within the same 2G chunk as the kernel?)

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/