RE: [GIT PULL] mm: frontswap (for 3.2 window)

From: Dan Magenheimer
Date: Wed Nov 02 2011 - 17:42:26 EST


> From: Rik van Riel [mailto:riel@xxxxxxxxxx]
> Subject: Re: [GIT PULL] mm: frontswap (for 3.2 window)
>
> On 11/01/2011 05:43 PM, Andrew Morton wrote:
>
> > I will confess to and apologise for dropping the ball on cleancache and
> > frontswap. I was never really able to convince myself that it met the
> > (very vague) cost/benefit test,
>
> I believe that it can, but if it does, we also have to
> operate under the assumption that the major distros will
> enable it.
> This means that "no overhead when not compiled in" is
> not going to apply to the majority of the users out there,
> and we need clear numbers on what the overhead is when it
> is enabled, but not used.

Right. That's Case B (see James Bottomley subthread)
and the overhead is one pointer comparison against
NULL per page physically swapin/swapout to a swap
device (i.e., essentially zero). Rik, would you
be willing to examine the code to confirm that
statement?

> We also need an API that can handle arbitrarily heavy
> workloads, since that is what people will throw at it
> if it is enabled everywhere.
>
> I believe that means addressing some of Andrea's concerns,
> specifically that the API should be able to handle vectors
> of pages and handle them asynchronously.
>
> Even if the current back-ends do not handle that today,
> chances are that (if tmem were to be enabled everywhere)
> people will end up throwing workloads at tmem that pretty
> much require such a thing.

Wish I'd been a little faster on typing the previous
message. Rik, could you ensure you respond to yourself
here if you are happy with my proposed batching design
to do the batching that you and Andrea want? (And if
you are not happy, provide code to show where you
would place a new batch-put hook?)

> An asynchronous interface would probably be a requirement
> for something as high latency as encrypted ramster :)

Pure asynchrony is a show-stopper for me. But the
only synchrony required is to move/transform the
data locally. Asynchronous things can still be done
but as a separate thread AFTER the data has been
"put" to tmem (which is exactly what RAMster does).

If asynchrony at frontswap_ops is demanded (and
I think Andrea has already retracted that), I would
have to ask you to present alternate code, both hooks
and driver, that work successfully, because my claim
is that it can't be done, certainly not without
massive changes to the swap subsystem (and likely
corresponding massive changes to VFS for cleancache).

> API concerns like this are things that should be solved
> before a merge IMHO, since afterwards we would end up with
> the "we cannot change the API, because that breaks users"
> scenario that we always end up finding ourselves in.

I think I've amply demonstrated that the API is
minimal and extensible, as demonstrated by the
above points. Much of Andrea's concerns were due to
a misunderstanding of the code in staging/zcache,
thinking it was part of the API; the only "API"
being considered here is defined by frontswap_ops.

Also, the API for frontswap_ops is almost identical to the
API for cleancache_ops and uses a much simpler, much
more isolated set of hooks. Frontswap "finishes"
tmem, cleancache is already merged. Leaving tmem
unfinished is worse than not having it all (and
I can already hear Christoph cackling and jumping
to his keyboard ;-)

Thanks,
Dan

OK, I really need to discontinue my participation in
this for a couple of days for personal/health reasons,
so I hope I've made my case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/