Re: [PATCH] Xen: Fix retry calls into PRIVCMD_MMAPBATCH*.

From: Andres Lagar-Cavilla
Date: Thu Aug 01 2013 - 09:30:22 EST

On Aug 1, 2013, at 8:04 AM, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:

> On 01/08/13 12:49, Andres Lagar-Cavilla wrote:
>> On Aug 1, 2013, at 7:23 AM, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
>>> On 01/08/13 04:30, Andres Lagar-Cavilla wrote:
>>>> -- Resend as I haven't seen this hit the lists. Maybe some smtp misconfig. Apologies. Also expanded cc --
>>>> When a foreign mapper attempts to map guest frames that are paged out,
>>>> the mapper receives an ENOENT response and will have to try again
>>>> while a helper process pages the target frame back in.
>>>> Gating checks on PRIVCMD_MMAPBATCH* ioctl args were preventing retries
>>>> of mapping calls.
>>> This breaks the auto_translated_physmap case as will allocate another
>>> set of empty pages and leak the previous set.
>> David,
>> not able to follow you here. Under what circumstances will another
>> set of empty pages be allocated? And where? are we talking page table pages?
> ....
> vma = find_vma(mm, m.addr);
> if (!vma ||
> vma->vm_ops != &privcmd_vm_ops ||
> (m.addr != vma->vm_start) ||
> ((m.addr + (nr_pages << PAGE_SHIFT)) != vma->vm_end) ||
> !privcmd_enforce_singleshot_mapping(vma)) {
> up_write(&mm->mmap_sem);
> ret = -EINVAL;
> goto out;
> }
> if (xen_feature(XENFEAT_auto_translated_physmap)) {
> ret = alloc_empty_pages(vma, m.num);
> Here.

Right right right. Excellent observation thanks. I fwd ported from 3.4 and this slipped through the cracks. Ok, V2 coming.
> if (ret < 0) {
> up_write(&mm->mmap_sem);
> goto out;
> }
> }
>>> This privcmd_enforce_singleshot_mapping() stuff seems very odd anyway.
>>> Does anyone know what it was for originally? It would be preferrable if
>>> we could update the mappings with a new set of foreign MFNs without
>>> having to tear down the VMA and recreate a new VMA.
>> I believe it's mostly historical. I agree with you on principle, but recreating VMAs is super-cheap.
> Tearing them down is not cheap as each page requires a trap-and-emulate
> to clear the PTE (see ptep_get_and_clear_full() in zap_pte_range()).

You need to tell the hypervisor to drop the ref on the mapped page. So you'd need a hyper call (arguably a multi-call) to do that, which is not free. Then you'd need privcmd and libxc to collude on agreeing to reuse the vma -- which has very low value in itself, just a piece of metadata. And you still need to deal with cleaning up the mapped refs when the mapping process crashes.

So a whole lot of new complexity for small value, imho.

Probably that's the whole point of the singleshot: don't forget you have something mapped in there. Because if you do you might leak the ref forever.

> David

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at