Re: [PATCH] zsmalloc: zs_page_migrate: not check inuse if migrate_mode is not MIGRATE_ASYNC

From: Minchan Kim
Date: Thu Jul 20 2017 - 04:49:07 EST


Hi Hui,

On Thu, Jul 20, 2017 at 02:39:17PM +0800, Hui Zhu wrote:
> Hi Minchan,
>
> I am sorry for answer late.
> I spent some time on ubuntu 16.04 with mmtests in an old laptop.
>
> 2017-07-17 13:39 GMT+08:00 Minchan Kim <minchan@xxxxxxxxxx>:
> > Hello Hui,
> >
> > On Fri, Jul 14, 2017 at 03:51:07PM +0800, Hui Zhu wrote:
> >> Got some -EBUSY from zs_page_migrate that will make migration
> >> slow (retry) or fail (zs_page_putback will schedule_work free_work,
> >> but it cannot ensure the success).
> >
> > I think EAGAIN(migration retrial) is better than EBUSY(bailout) because
> > expectation is that zsmalloc will release the empty zs_page soon so
> > at next retrial, it will be succeeded.
>
>
> I am not sure.
>
> This is the call trace of zs_page_migrate:
> zs_page_migrate
> mapping->a_ops->migratepage
> move_to_new_page
> __unmap_and_move
> unmap_and_move
> migrate_pages
>
> In unmap_and_move will remove page from migration page list
> and call putback_movable_page(will call mapping->a_ops->putback_page) if
> return value of zs_page_migrate is not -EAGAIN.
> The comments of this part:
> After called mapping->a_ops->putback_page, zsmalloc can free the page
> from ZS_EMPTY list.
>
> If retrun -EAGAIN, the page will be not be put back. EAGAIN page will
> be try again in migrate_pages without re-isolate.

You're right. With -EGAIN, it burns out CPU pointlessly.

>
> > About schedule_work, as you said, we don't make sure when it happens but
> > I believe it will happen in a migration iteration most of case.
> > How often do you see that case?
>
> I noticed this issue because my Kernel patch https://lkml.org/lkml/2014/5/28/113
> that will remove retry in __alloc_contig_migrate_range.
> This retry willhandle the -EBUSY because it will re-isolate the page
> and re-call migrate_pages.
> Without it will make cma_alloc fail at once with -EBUSY.

LKML.org server is not responding so hard to see patch you mentioned
but I just got your point now so I don't care any more. Your patch is
enough simple as considering the benefit.
Just look at below comment.

>
> >
> >>
> >> And I didn't find anything that make zs_page_migrate cannot work with
> >> a ZS_EMPTY zspage.
> >> So make the patch to not check inuse if migrate_mode is not
> >> MIGRATE_ASYNC.
> >
> > At a first glance, I think it work but the question is that it a same problem
> > ith schedule_work of zs_page_putback. IOW, Until the work is done, compaction
> > cannot succeed. Do you have any number before and after?
> >
>
>
> Following is what I got with highalloc-performance in a vbox with 2
> cpu 1G memory 512 zram as swap:
> ori afte
> orig after
> Minor Faults 50805113 50801261
> Major Faults 43918 46692
> Swap Ins 42087 46299
> Swap Outs 89718 105495
> Allocation stalls 0 0
> DMA allocs 57787 69787
> DMA32 allocs 47964599 47983772
> Normal allocs 0 0
> Movable allocs 0 0
> Direct pages scanned 45493 28837
> Kswapd pages scanned 1565222 1512947
> Kswapd pages reclaimed 1342222 1334030
> Direct pages reclaimed 45615 30174
> Kswapd efficiency 85% 88%
> Kswapd velocity 1897.101 1708.309
> Direct efficiency 100% 104%
> Direct velocity 55.139 32.561
> Percentage direct scans 2% 1%
> Zone normal velocity 1952.240 1740.870
> Zone dma32 velocity 0.000 0.000
> Zone dma velocity 0.000 0.000
> Page writes by reclaim 89764.000 106043.000
> Page writes file 46 548
> Page writes anon 89718 105495
> Page reclaim immediate 21457 7269
> Sector Reads 3259688 3144160
> Sector Writes 3667252 3675528
> Page rescued immediate 0 0
> Slabs scanned 1042872 1035438
> Direct inode steals 8042 7772
> Kswapd inode steals 54295 55075
> Kswapd skipped wait 0 0
> THP fault alloc 175 200
> THP collapse alloc 226 363
> THP splits 0 0
> THP fault fallback 11 1
> THP collapse fail 3 1
> Compaction stalls 536 647
> Compaction success 322 384
> Compaction failures 214 263
> Page migrate success 119608 127002
> Page migrate failure 2723 2309
> Compaction pages isolated 250179 265318
> Compaction migrate scanned 9131832 9351314
> Compaction free scanned 2093272 3059014
> Compaction cost 192 202
> NUMA alloc hit 47124555 47086375
> NUMA alloc miss 0 0
> NUMA interleave hit 0 0
> NUMA alloc local 47124555 47086375
> NUMA base PTE updates 0 0
> NUMA huge PMD updates 0 0
> NUMA page range updates 0 0
> NUMA hint faults 0 0
> NUMA hint local faults 0 0
> NUMA hint local percent 100 100
> NUMA pages migrated 0 0
> AutoNUMA cost 0% 0%
>
> It looks Page migrate success is increased.
>
> Thanks,
> Hui
>
>
> > Thanks.
> >
> >>
> >> Signed-off-by: Hui Zhu <zhuhui@xxxxxxxxxx>
> >> ---
> >> mm/zsmalloc.c | 66 +++++++++++++++++++++++++++++++++--------------------------
> >> 1 file changed, 37 insertions(+), 29 deletions(-)
> >>
> >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> >> index d41edd2..c298e5c 100644
> >> --- a/mm/zsmalloc.c
> >> +++ b/mm/zsmalloc.c
> >> @@ -1982,6 +1982,7 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
> >> unsigned long old_obj, new_obj;
> >> unsigned int obj_idx;
> >> int ret = -EAGAIN;
> >> + int inuse;
> >>
> >> VM_BUG_ON_PAGE(!PageMovable(page), page);
> >> VM_BUG_ON_PAGE(!PageIsolated(page), page);
> >> @@ -1996,21 +1997,24 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
> >> offset = get_first_obj_offset(page);
> >>
> >> spin_lock(&class->lock);
> >> - if (!get_zspage_inuse(zspage)) {
> >> + inuse = get_zspage_inuse(zspage);
> >> + if (mode == MIGRATE_ASYNC && !inuse) {
> >> ret = -EBUSY;
> >> goto unlock_class;
> >> }
> >>
> >> pos = offset;
> >> s_addr = kmap_atomic(page);
> >> - while (pos < PAGE_SIZE) {
> >> - head = obj_to_head(page, s_addr + pos);
> >> - if (head & OBJ_ALLOCATED_TAG) {
> >> - handle = head & ~OBJ_ALLOCATED_TAG;
> >> - if (!trypin_tag(handle))
> >> - goto unpin_objects;
> >> + if (inuse) {

I don't want to add inuse check for every loop. It might avoid unncessary
looping in every loop of zs_page_migrate so it is for optimization, not
correction. As I consider it would happen rarely, I think we don't need
to add the check. Could you just remove get_zspage_inuse check, instead?

like this.


diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 013eea76685e..2d3d75fb0f16 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1980,14 +1980,9 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
pool = mapping->private_data;
class = pool->size_class[class_idx];
offset = get_first_obj_offset(page);
+ pos = offset;

spin_lock(&class->lock);
- if (!get_zspage_inuse(zspage)) {
- ret = -EBUSY;
- goto unlock_class;
- }
-
- pos = offset;
s_addr = kmap_atomic(page);
while (pos < PAGE_SIZE) {
head = obj_to_head(page, s_addr + pos);



> >> + while (pos < PAGE_SIZE) {
> >> + head = obj_to_head(page, s_addr + pos);
> >> + if (head & OBJ_ALLOCATED_TAG) {
> >> + handle = head & ~OBJ_ALLOCATED_TAG;
> >> + if (!trypin_tag(handle))
> >> + goto unpin_objects;
> >> + }
> >> + pos += class->size;
> >> }
> >> - pos += class->size;
> >> }
> >>
> >> /*
> >> @@ -2020,20 +2024,22 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
> >> memcpy(d_addr, s_addr, PAGE_SIZE);
> >> kunmap_atomic(d_addr);
> >>
> >> - for (addr = s_addr + offset; addr < s_addr + pos;
> >> - addr += class->size) {
> >> - head = obj_to_head(page, addr);
> >> - if (head & OBJ_ALLOCATED_TAG) {
> >> - handle = head & ~OBJ_ALLOCATED_TAG;
> >> - if (!testpin_tag(handle))
> >> - BUG();
> >> -
> >> - old_obj = handle_to_obj(handle);
> >> - obj_to_location(old_obj, &dummy, &obj_idx);
> >> - new_obj = (unsigned long)location_to_obj(newpage,
> >> - obj_idx);
> >> - new_obj |= BIT(HANDLE_PIN_BIT);
> >> - record_obj(handle, new_obj);
> >> + if (inuse) {
> >> + for (addr = s_addr + offset; addr < s_addr + pos;
> >> + addr += class->size) {
> >> + head = obj_to_head(page, addr);
> >> + if (head & OBJ_ALLOCATED_TAG) {
> >> + handle = head & ~OBJ_ALLOCATED_TAG;
> >> + if (!testpin_tag(handle))
> >> + BUG();
> >> +
> >> + old_obj = handle_to_obj(handle);
> >> + obj_to_location(old_obj, &dummy, &obj_idx);
> >> + new_obj = (unsigned long)
> >> + location_to_obj(newpage, obj_idx);
> >> + new_obj |= BIT(HANDLE_PIN_BIT);
> >> + record_obj(handle, new_obj);
> >> + }
> >> }
> >> }
> >>
> >> @@ -2055,14 +2061,16 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
> >>
> >> ret = MIGRATEPAGE_SUCCESS;
> >> unpin_objects:
> >> - for (addr = s_addr + offset; addr < s_addr + pos;
> >> + if (inuse) {
> >> + for (addr = s_addr + offset; addr < s_addr + pos;
> >> addr += class->size) {
> >> - head = obj_to_head(page, addr);
> >> - if (head & OBJ_ALLOCATED_TAG) {
> >> - handle = head & ~OBJ_ALLOCATED_TAG;
> >> - if (!testpin_tag(handle))
> >> - BUG();
> >> - unpin_tag(handle);
> >> + head = obj_to_head(page, addr);
> >> + if (head & OBJ_ALLOCATED_TAG) {
> >> + handle = head & ~OBJ_ALLOCATED_TAG;
> >> + if (!testpin_tag(handle))
> >> + BUG();
> >> + unpin_tag(handle);
> >> + }
> >> }
> >> }
> >> kunmap_atomic(s_addr);
> >> --
> >> 1.9.1
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>