Re: [PATCH 04/12] KVM: MMU: log dirty page after marking sptewritable

From: Marcelo Tosatti
Date: Wed Nov 20 2013 - 16:32:20 EST


On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
> > But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
> > executing?
>
> Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be generated
> when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages the vcpus
> should be stopped.
>
> >
> > With fast page fault:
> >
> > if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
> > /* The window in here... */
> > mark_page_dirty(vcpu->kvm, gfn);
> >
> > And the $SUBJECT set_spte reordering, the rule becomes
> >
> > A call to GET_DIRTY_LOG guarantees to return correct information about
> > dirty pages before invocation of the previous GET_DIRTY_LOG call.
> >
> > (see example 1: the next GET_DIRTY_LOG will return the dirty information
> > there).
> >
>
> It seems no.
>
> The first GET_DIRTY_LOG can happen before fast-page-faultï
> the second GET_DIRTY_LOG happens in the window between cmpxchg()
> and mark_page_dirty(), for the second one, the information is still âincorrectâ.

Its correct for the previous GET_DIRTY_LOG call.

> > The rule for sptes that is, because kvm_write_guest does not match the
> > documentation at all.
>
> You mean the case of âkvm_write_guestâ is valid (I do not know why it is)?
> Or anything else?
>
> >
> > So before example 1 and this patch, the rule (well for sptes at least) was
> >
> > "Given a memory slot, return a bitmap containing any pages dirtied
> > since the last call to this ioctl. Bit 0 is the first page in the
> > memory slot. Ensure the entire structure is cleared to avoid padding
> > issues."
> >
> > Can you explain why it is OK to relax this rule?
>
> Itâs because:
> 1) it doesnât break current use cases, i.e. Live migration and FB-flushing.
> 2) the current code, like kvm_write_guest has already broken the documentation
> (the guest page has been written but missed in the dirty bitmap).
> 3) itâs needless to implement a exact get-dirty-pages since the dirty pages can
> no be exactly got except stopping vcpus.
>
> So i think we'd document this case instead. No?

Lets figure out the requirements, then. I don't understand why
FB-flushing is safe (think kvm-autotest: one pixel off the entire
test fails).

Before fast page fault: Pages are either write protected or the
corresponding dirty bitmap bit is set. Any write faults to dirty logged
sptes while GET_DIRTY log is executing in the protected section are
allowed to instantiate writeable spte after GET_DIRTY log is finished.

After fast page fault: Pages can be writeable and the dirty bitmap not
set. Therefore data can be dirty before GET_DIRTY executes and still
fail to be returned in the bitmap.

Since this patchset does not introduce change in behaviour (fast pf
did), no reason to avoid merging this.

BTW, since GET_DIRTY log does not require to report concurrent (to
GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
list, is it?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/