RE: [PATCH 7/8] zswap: add to mm/

From: Dan Magenheimer
Date: Mon Dec 31 2012 - 18:07:09 EST


> From: Seth Jennings [mailto:sjenning@xxxxxxxxxxxxxxxxxx]
> Subject: [PATCH 7/8] zswap: add to mm/
>
> zswap is a thin compression backend for frontswap. It receives
> pages from frontswap and attempts to store them in a compressed
> memory pool, resulting in an effective partial memory reclaim and
> dramatically reduced swap device I/O.

Hi Seth --

Happy (almost) New Year!

I am eagerly studying one of the details of your zswap "flush"
code in this patch to see how you solved a problem or two that
I was struggling with for the similar mechanism RFC'ed for zcache
(see https://lkml.org/lkml/2012/10/3/558). I like the way
that you force the newly-uncompressed to-be-flushed page immediately
into a swap bio in zswap_flush_entry via the call to swap_writepage,
though I'm not entirely convinced that there aren't some race
conditions there. However, won't swap_writepage simply call
frontswap_store instead and re-compress the page back into zswap?
The (very ugly) solution I used for this was to flag the page in a
frontswap_denial_map (see https://lkml.org/lkml/2012/10/3/560).
Don't you require something like that also, or am I missing some
magic in your code?

I'm also a bit concerned about the consequent recursion:

frontswap_store->
zswap_fs_store->
zswap_flush_entries->
zswap_flush_entry->
__swap_writepage->
swap_writepage->
frontswap_store->
zswap_fs_store-> etc

It's not obvious to me how deeply this might recurse and/or
how the recursion is terminated. The RFC'ed zcache code
calls its equivalence of your "flush" code only from the
shrinker thread to avoid this, though there may be a third,
better, way.

A second related issue that concerns me is that, although you
are now, like zcache2, using an LRU queue for compressed pages
(aka "zpages"), there is no relationship between that queue and
physical pageframes. In other words, you may free up 100 zpages
out of zswap via zswap_flush_entries, but not free up a single
pageframe. This seems like a significant design issue. Or am
I misunderstanding the code?

A third concern is about scalability... the locking seems very
coarse-grained. In zcache, you personally observed and fixed
hashbucket contention (see https://lkml.org/lkml/2011/9/29/215).
Doesn't zswap's tree_lock essentially use a single tree (per
swaptype), i.e. no scalability?

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/