Re: [PATCHv2 8/9] zswap: add to mm/

From: Seth Jennings
Date: Mon Jan 28 2013 - 10:29:06 EST


On 01/25/2013 04:44 PM, Rik van Riel wrote:
> On 01/07/2013 03:24 PM, Seth Jennings wrote:
>> zswap is a thin compression backend for frontswap. It receives
>> pages from frontswap and attempts to store them in a compressed
>> memory pool, resulting in an effective partial memory reclaim and
>> dramatically reduced swap device I/O.
>>
>> Additional, in most cases, pages can be retrieved from this
>> compressed store much more quickly than reading from tradition
>> swap devices resulting in faster performance for many workloads.
>>
>> This patch adds the zswap driver to mm/
>>
>> Signed-off-by: Seth Jennings <sjenning@xxxxxxxxxxxxxxxxxx>
>
> I like the approach of flushing pages into actual disk based
> swap when compressed swap is full. I would like it if that
> was advertised more prominently in the changelog :)

Thanks so much for the review!

> The code looks mostly good, complaints are at the nitpick level.
>
> One worry is that the pool can grow to whatever maximum was
> decided, and there is no way to shrink it when memory is
> required for something else.
>
> Would it be an idea to add a shrinker for the zcache pool,
> that can also shrink the zcache pool when required?
>
> Of course, that does lead to the question of how to balance
> the pressure from that shrinker, with the new memory entering
> zcache from the swap side. I have no clear answers here, just
> something to think about...

Yes, I prototyped a shrinker interface for zswap, but, as we both
figured, it shrinks the zswap compressed pool too aggressively to the
point of being useless.

Right now I'm working on a zswap thread that will "leak" pages out to
the swap device on an LRU basis over time. That way if the page is a
rarely accessed page, it will eventually be written out to the swap
device and it's memory freed, even if the zswap pool isn't full.

Would this address your concerns?

>> +static void zswap_flush_entries(unsigned type, int nr)
>> +{
>> + struct zswap_tree *tree = zswap_trees[type];
>> + struct zswap_entry *entry;
>> + int i, ret;
>> +
>> +/*
>> + * This limits is arbitrary for now until a better
>> + * policy can be implemented. This is so we don't
>> + * eat all of RAM decompressing pages for writeback.
>> + */
>> +#define ZSWAP_MAX_OUTSTANDING_FLUSHES 64
>> + if (atomic_read(&zswap_outstanding_flushes) >
>> + ZSWAP_MAX_OUTSTANDING_FLUSHES)
>> + return;
>
> Having this #define right in the middle of the function is
> rather ugly. Might be worth moving it to the top.

Yes. In my mind, this policy was going to be replaced by a better one
soon. Checking may_write_to_queue() was my idea. I didn't spend too
much time making that part pretty.

>> +static int __init zswap_debugfs_init(void)
>> +{
>> + if (!debugfs_initialized())
>> + return -ENODEV;
>> +
>> + zswap_debugfs_root = debugfs_create_dir("zswap", NULL);
>> + if (!zswap_debugfs_root)
>> + return -ENOMEM;
>> +
>> + debugfs_create_u64("saved_by_flush", S_IRUGO,
>> + zswap_debugfs_root, &zswap_saved_by_flush);
>> + debugfs_create_u64("pool_limit_hit", S_IRUGO,
>> + zswap_debugfs_root, &zswap_pool_limit_hit);
>> + debugfs_create_u64("reject_flush_attempted", S_IRUGO,
>> + zswap_debugfs_root, &zswap_flush_attempted);
>> + debugfs_create_u64("reject_tmppage_fail", S_IRUGO,
>> + zswap_debugfs_root, &zswap_reject_tmppage_fail);
>> + debugfs_create_u64("reject_flush_fail", S_IRUGO,
>> + zswap_debugfs_root, &zswap_reject_flush_fail);
>> + debugfs_create_u64("reject_zsmalloc_fail", S_IRUGO,
>> + zswap_debugfs_root, &zswap_reject_zsmalloc_fail);
>> + debugfs_create_u64("reject_kmemcache_fail", S_IRUGO,
>> + zswap_debugfs_root, &zswap_reject_kmemcache_fail);
>> + debugfs_create_u64("reject_compress_poor", S_IRUGO,
>> + zswap_debugfs_root, &zswap_reject_compress_poor);
>> + debugfs_create_u64("flushed_pages", S_IRUGO,
>> + zswap_debugfs_root, &zswap_flushed_pages);
>> + debugfs_create_u64("duplicate_entry", S_IRUGO,
>> + zswap_debugfs_root, &zswap_duplicate_entry);
>> + debugfs_create_atomic_t("pool_pages", S_IRUGO,
>> + zswap_debugfs_root, &zswap_pool_pages);
>> + debugfs_create_atomic_t("stored_pages", S_IRUGO,
>> + zswap_debugfs_root, &zswap_stored_pages);
>> + debugfs_create_atomic_t("outstanding_flushes", S_IRUGO,
>> + zswap_debugfs_root, &zswap_outstanding_flushes);
>> +
>
> Some of these statistics would be very useful to system
> administrators, who will not be mounting debugfs on
> production systems.
>
> Would it make sense to export some of these statistics
> through sysfs?

That's fine. Which of these stats do you think should be in sysfs?

Thanks again for taking time to look at this!

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/