Re: [RFC][PATCH 4/5] [PATCH 4/5] kvm-ept-idle: EPT page table walk for A bits

From: Dave Hansen
Date: Thu Sep 06 2018 - 10:35:39 EST


On 09/01/2018 04:28 AM, Fengguang Wu wrote:
> (2) would need fundemental changes to the interface. It seems existing solutions
> for sparse files like SEEK_HOLE/SEEK_DATA and FIEMAP ioctl may not serve this
> situation well. The most efficient way could be to fill user space read()
> buffer with an array of small extents:

I've only been doing kernel development a few short years, but I've
learned that designing user/kernel interfaces is hard.

A comment in an RFC saying that we need "fundamental changes to the
interface" seems to be more of a cry for help than a request for
comment. This basically says to me: ignore the interface, it's broken.

> This borrows host page table walk macros/functions to do EPT walk.
> So it depends on them using the same level.

Have you actually run this code?

How does this work? It's walking the 'ept_root' that appears to be a
guest CR3 register value. It doesn't appear to be the host CR3 value of
the qemu (or other hypervisor).

I'm also woefully confused why you are calling these EPT walks and then
walking the x86-style page tables. EPT tables don't have the same
format as x86 page tables, plus they don't start at a CR3-provided value.

I'm also rather unsure that when running a VM, *any* host-format page
tables get updated A/D bits. You need a host vaddr to do a host-format
page table walk in the host page tables, and the EPT tables do direct
guest physical to host physical translation. There's no host vaddr
involved at all in those translations.

> + if (!ept_pte_present(*pte) ||
> + !ept_pte_accessed(*pte))
> + idle = 1;

Huh? So, !Present and idle are treated the same? If you had large
swaths of !Present memory, you would see that in this interface and say,
"gee, I've got a lot of idle memory to migrate" and then go do a bunch
of calls to migrate it? That seems terminally wasteful.

Who is going to use this?

Do you have an example, at least dummy app?

Can more than one app read this data at the same time? Who manages it?
Who owns it? Are multiple reads destructive?

This entire set is almost entirely comment-free except for the
occasional C++ comments. That doesn't inspire confidence.