Re: [PATCH] i386: fix vmalloc_sync_all() for Xen

From: Jan Beulich
Date: Thu Jun 19 2008 - 05:43:27 EST


>Given that the usermode PGDs will never need syncing, I think it would
>be better to use KERNEL_PGD_PTRS, and define
>
>#define sync_index(a) (((a) >> PMD_SHIFT) - KERNEL_PGD_BOUNDARY)
>
>for a massive 192 byte saving in bss.

I was considering that, too, but didn't do so for simplicity's sake. If I'll
have to re-spin the patch, I may as well do it.

>> + for (address = start; address >= TASK_SIZE; address += PMD_SIZE) {
>>
>
>Would it be better - especially for the Xen case - to only iterate from
>TASK_SIZE to FIXADDR_TOP rather than wrapping around? What will
>vmalloc_sync_one do on Xen mappings?

Could be done, but since there will never be any out-of-sync Xen entries,
it doesn't hurt doing the full pass. I agree it would possibly be more
correct,though.

>> + if (!test_bit(sync_index(address), insync)) {
>>
>It's probably worth reversing this test and removing a layer of indentation.

How? There's a second if() following this one, so we can't just 'continue;'
here.

>> spin_lock_irqsave(&pgd_lock, flags);
>> + if (unlikely(list_empty(&pgd_list))) {
>> + spin_unlock_irqrestore(&pgd_lock, flags);
>> + return;
>> + }
>>
>
>This seems a bit warty. If the list is empty, then won't the
>list_for_each_entry() just fall through? Presumably this only applies
>to boot, since pgd_list won't be empty on a running system with usermode
>processes. Is there a correctness issue here, or is it just a
>micro-optimisation?

No, it isn't. Note the setting to NULL of page, which after the loop gets
tested for. list_for_each_entry() would never yield a NULL page, even
if the list is empty. And

>> list_for_each_entry(page, &pgd_list, lru) {
>> if (!vmalloc_sync_one(page_address(page),
>> - address))
>> + address)) {
>> + BUG_ON(list_first_entry(&pgd_list,
>> + struct page,
>> + lru) != page);
>>
>
>What condition is this testing for?

This is a replacement of the BUG_ON() that an earlier patch from you
removed: Failure of vmalloc_sync_one() must happen on the first
entry or never, and this is what is being checked for here.

>> #else /* CONFIG_X86_64 */
>> /*
>> * Note that races in the updates of insync and start aren't
>>
>
>Any chance of unifying this with the very similar-looking loop below it?

Not at the same time I would say. As a subsequent thing, it might be
possible, though there are differences that make it desirable to keep
it distinct in my opinion.

>(I have to admit I don't understand why 64-bit needs to worry about
>syncing stuff. Doesn't it have enough pgds to go around? Is it because
>it wants to put modules within the same 2G chunk as the kernel?)

No, just like on 32-bit it's because modules loaded may access
vmalloc()-ed memory from notifiers that are called in contexts (NMI)
where taking even simple (propagation) page faults cannot be
tolerated (since the final IRET would result in finishing the NMI
handling from a CPU (or hypervisor) perspective.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/