Re: [PATCH 2/8] xen/balloon: Move common memory reservation routines to a module

From: Oleksandr Andrushchenko
Date: Wed May 30 2018 - 13:46:14 EST


On 05/30/2018 06:54 PM, Boris Ostrovsky wrote:
On 05/30/2018 04:29 AM, Oleksandr Andrushchenko wrote:
On 05/29/2018 11:03 PM, Boris Ostrovsky wrote:
On 05/29/2018 02:22 PM, Oleksandr Andrushchenko wrote:
On 05/29/2018 09:04 PM, Boris Ostrovsky wrote:
On 05/25/2018 11:33 AM, Oleksandr Andrushchenko wrote:
@@ -463,11 +457,6 @@ static enum bp_state
increase_reservation(unsigned long nr_pages)
ÂÂÂÂÂÂ int rc;
ÂÂÂÂÂÂ unsigned long i;
ÂÂÂÂÂÂ struct pageÂÂ *page;
-ÂÂÂ struct xen_memory_reservation reservation = {
-ÂÂÂÂÂÂÂ .address_bits = 0,
-ÂÂÂÂÂÂÂ .extent_order = EXTENT_ORDER,
-ÂÂÂÂÂÂÂ .domidÂÂÂÂÂÂÂ = DOMID_SELF
-ÂÂÂ };
ÂÂ ÂÂÂÂÂ if (nr_pages > ARRAY_SIZE(frame_list))
ÂÂÂÂÂÂÂÂÂÂ nr_pages = ARRAY_SIZE(frame_list);
@@ -486,9 +475,7 @@ static enum bp_state
increase_reservation(unsigned long nr_pages)
ÂÂÂÂÂÂÂÂÂÂ page = balloon_next_page(page);
ÂÂÂÂÂÂ }
ÂÂ -ÂÂÂ set_xen_guest_handle(reservation.extent_start, frame_list);
-ÂÂÂ reservation.nr_extents = nr_pages;
-ÂÂÂ rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+ÂÂÂ rc = xenmem_reservation_increase(nr_pages, frame_list);
ÂÂÂÂÂÂ if (rc <= 0)
ÂÂÂÂÂÂÂÂÂÂ return BP_EAGAIN;
ÂÂ @@ -496,29 +483,7 @@ static enum bp_state
increase_reservation(unsigned long nr_pages)
ÂÂÂÂÂÂÂÂÂÂ page = balloon_retrieve(false);
ÂÂÂÂÂÂÂÂÂÂ BUG_ON(page == NULL);
ÂÂ -#ifdef CONFIG_XEN_HAVE_PVMMU
-ÂÂÂÂÂÂÂ /*
-ÂÂÂÂÂÂÂÂ * We don't support PV MMU when Linux and Xen is using
-ÂÂÂÂÂÂÂÂ * different page granularity.
-ÂÂÂÂÂÂÂÂ */
-ÂÂÂÂÂÂÂ BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
-
-ÂÂÂÂÂÂÂ if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-ÂÂÂÂÂÂÂÂÂÂÂ unsigned long pfn = page_to_pfn(page);
-
-ÂÂÂÂÂÂÂÂÂÂÂ set_phys_to_machine(pfn, frame_list[i]);
-
-ÂÂÂÂÂÂÂÂÂÂÂ /* Link back into the page tables if not highmem. */
-ÂÂÂÂÂÂÂÂÂÂÂ if (!PageHighMem(page)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int ret;
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ret = HYPERVISOR_update_va_mapping(
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (unsigned long)__va(pfn << PAGE_SHIFT),
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ mfn_pte(frame_list[i], PAGE_KERNEL),
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ BUG_ON(ret);
-ÂÂÂÂÂÂÂÂÂÂÂ }
-ÂÂÂÂÂÂÂ }
-#endif
+ÂÂÂÂÂÂÂ xenmem_reservation_va_mapping_update(1, &page,
&frame_list[i]);

Can you make a single call to xenmem_reservation_va_mapping_update(rc,
...)? You need to keep track of pages but presumable they can be put
into an array (or a list). In fact, perhaps we can have
balloon_retrieve() return a set of pages.
This is actually how it is used later on for dma-buf, but I just
didn't want
to alter original balloon code too much, but this can be done, in
order of simplicity:

1. Similar to frame_list, e.g. static array of struct page* of size
ARRAY_SIZE(frame_list):
more static memory is used, but no allocations

2. Allocated at run-time with kcalloc: allocation can fail
If this is called in freeing DMA buffer code path or in error path then
we shouldn't do it.


3. Make balloon_retrieve() return a set of pages: will require
list/array allocation
and handling, allocation may fail, balloon_retrieve prototype change
balloon pages are strung on the lru list. Can we keep have
balloon_retrieve return a list of pages on that list?
First of all, before we go deep in details, I will highlight
the goal of the requested change: for balloon driver we call
xenmem_reservation_va_mapping_update(*1*, &page, &frame_list[i]);
from increase_reservation
and
xenmem_reservation_va_mapping_reset(*1*, &page);
from decrease_reservation and it seems to be not elegant because of
that one page/frame passed while we might have multiple pages/frames
passed at once.

In the balloon driver the producer of pages for increase_reservation
is balloon_retrieve(false) and for decrease_reservation it is
alloc_page(gfp).
In case of decrease_reservation the page is added on the list:
LIST_HEAD(pages);
[...]
list_add(&page->lru, &pages);

and in case of increase_reservation it is retrieved page by page
and can be put on a list as well with the same code from
decrease_reservation, e.g.
LIST_HEAD(pages);
[...]
list_add(&page->lru, &pages);

Thus, both decrease_reservation and increase_reservation may hold
their pages on a list before calling
xenmem_reservation_va_mapping_{update|reset}.

For that we need a prototype change:
xenmem_reservation_va_mapping_reset(<nr_pages>, <list of pages>);
But for xenmem_reservation_va_mapping_update it will look like:
xenmem_reservation_va_mapping_update(<nr_pages>, <list of pages>,
<array of frames>)
which seems to be inconsistent. Converting entries of the static
frame_list array
into corresponding list doesn't seem to be cute as well.

For dma-buf use-case arrays are more preferable as dma-buf constructs
scatter-gather
tables from array of pages etc. and if page list is passed then it
needs to be
converted into page array anyways.

So, we can:
1. Keep the prototypes as is, e.g. accept array of pages and use
nr_pages == 1 in
case of balloon driver (existing code)
2. Statically allocate struct page* array in the balloon driver and
fill it with pages
when those pages are retrieved:
static struct page *page_list[ARRAY_SIZE(frame_list)];
which will take additional 8KiB of space on 64-bit platform, but
simplify things a lot.
3. Allocate struct page *page_list[ARRAY_SIZE(frame_list)] dynamically

As to Boris' suggestion "balloon pages are strung on the lru list. Can
we keep have
balloon_retrieve return a list of pages on that list?"
Because of alloc_xenballooned_pages' retry logic for page retireval, e.g.
ÂÂÂ while (pgno < nr_pages) {
ÂÂÂ ÂÂÂ page = balloon_retrieve(true);
ÂÂÂ ÂÂÂ if (page) {
[...]
ÂÂÂ ÂÂÂ } else {
ÂÂÂ ÂÂÂ ÂÂÂ ret = add_ballooned_pages(nr_pages - pgno);
[...]
ÂÂÂ }
I wouldn't change things that much.

IMO, we can keep 1 page based API with the only overhead for balloon
driver of
function calls to xenmem_reservation_va_mapping_{update|reset} for
each page.


I still think what I suggested is doable but we can come back to it
later and keep your per-page implementation for now.

BTW, I also think you can further simplify
xenmem_reservation_va_mapping_* routines by bailing out right away if
xen_feature(XENFEAT_auto_translated_physmap). In fact, you might even
make them inlines, along the lines of

inline void xenmem_reservation_va_mapping_reset(unsigned long count,
struct page **pages)
{
#ifdef CONFIG_XEN_HAVE_PVMMU
if (!xen_feature(XENFEAT_auto_translated_physmap))
__xenmem_reservation_va_mapping_reset(...)
#endif
}
How about:

#ifdef CONFIG_XEN_HAVE_PVMMU
static inline __xenmem_reservation_va_mapping_reset(struct page *page)
{
[...]
}
#endif

and

void xenmem_reservation_va_mapping_reset(unsigned long count,
ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ Âstruct page **pages)
{
#ifdef CONFIG_XEN_HAVE_PVMMU
ÂÂÂ if (!xen_feature(XENFEAT_auto_translated_physmap)) {
ÂÂÂ ÂÂÂ int i;

ÂÂÂ ÂÂÂ for (i = 0; i < count; i++)
ÂÂÂ ÂÂÂ ÂÂÂ __xenmem_reservation_va_mapping_reset(pages[i]);
ÂÂÂ }
#endif
}

This way I can use __xenmem_reservation_va_mapping_reset(page);
instead of xenmem_reservation_va_mapping_reset(1, &page);
Or some such.

-boris

-boris