Re: [PATCH] mm: Add Kcompressd for accelerated memory compression

From: Qun-wei Lin (林群崴)
Date: Tue Jul 08 2025 - 23:25:46 EST

Next message: Miklos Szeredi: "Re: [RFC V2 00/18] famfs: port into fuse"
Previous message: Trevor Gross: "Re: [PATCH v3 2/3] rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 2025-06-27 at 16:21 -0700, Nhat Pham wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On Sun, Jun 22, 2025 at 10:16 PM Barry Song <21cnbao@xxxxxxxxx>
> wrote:
> >
> > Hi Nhat,
> >
> > On Wed, Jun 18, 2025 at 2:21 AM Nhat Pham <nphamcs@xxxxxxxxx>
> > wrote:
> > >
> > > On Sun, Jun 15, 2025 at 8:41 PM Barry Song <21cnbao@xxxxxxxxx>
> > > wrote:
> > > > > >
> > > > > > That seems unnecessary. There is an existing method for
> > > > > > asynchronous
> > > > > > writeback, and pageout() is naturally fully set up to
> > > > > > handle this.
> > > > > >
> > > > > > IMO the better way to do this is to make zswap_store() (and
> > > > > > zram_bio_write()?) asynchronous. Make those functions queue
> > > > > > the work
> > > > > > and wake the compression daemon, and then have the daemon
> > > > > > call
> > > > > > folio_end_writeback() / bio_endio() when it's done with it.
> > > >
> > > > > +1.
> > > >
> > > >
> > > > But,
> > > > How could this be possible for zswap? zswap_store() is only a
> > > > frontend —
> > > > we still need its return value to determine whether
> > > > __swap_writepage()
> > > > is required. Waiting for the result of zswap_store() is
> > > > inherently a
> > > > synchronous step.
> > >
> > > Hmm, I might be misunderstanding either of you, but it sounds
> > > like
> > > what you're describing here does not contradict what Johannes is
> > > proposing?
> >
> > It seems contradictory: Johannes proposes that zswap could behave
> > like zRAM
> > by invoking `folio_end_writeback()` or `bio_endio()`, but this
> > doesn’t align
> > with actual behavior since zswap_store might not end
> > `swap_writeout()`—it may
> > still proceed to `__swap_writeback()` to complete the final steps.
> >
> > Meanwhile, Qun-wei’s RFC has already explored using
> > `folio_end_writeback()` and
> > `bio_endio()` at the end of `__swap_writepage()` for zRAM, though
> > that approach
> > also has its own issues.
>
>
> Hmm OK. I'll let Johannes comment on this then :)

Hi Johannes,

Would appreciate your feedback when you have a moment.

>
> >
> > >
> > > >
> > > > My point is that folio_end_writeback() and bio_endio() can only
> > > > be
> > > > called after the entire zswap_store() → __swap_writepage()
> > > > sequence is
> > > > completed. That’s why both are placed in the new kcompressed.
> > >
> > > Hmm, how about:
> > >
> > > 1. Inside zswap_store(), we first obtain the obj_cgroup
> > > reference,
> > > check cgroup and pool limit, and grab a zswap pool reference (in
> > > effect, determining the slot allocator and compressor).
> > >
> > > 2. Next, we try to queue the work to kcompressd, saving the folio
> > > and
> > > the zswap pool (and whatever else we need for the continuation).
> > > If
> > > this fails, we can proceed with the old synchronous path.
> > >
> > > 3. In kcompressed daemon, we perform the continuation of
> > > zswap_store(): compression, slot allocation, storing, zswap's LRU
> > > modification, etc. If this fails, we check if the mem_cgroup
> > > enables
> > > writeback. If it's enabled, we can call __swap_writepage().
> > > Ideally,
> > > if writeback is disabled, we should activate the page, but it
> > > might
> > > not be possible since shrink_folio_list() might already re-add
> > > the
> > > page to the inactive lru. Maybe some modification of pageout()
> > > and
> > > shrink_folio_list() can make this work, but I haven't thought too
> > > deeply about it :) If it's impossible, we can perform async
> > > compression only for cgroups that enable writeback for now. Once
> > > we
> > > fix zswap's handling of incompressible pages, we can revisit this
> > > decision (+ SJ).
> > >
> > > TLDR: move the work-queueing step forward a bit, into the middle
> > > of
> > > zswap_store().
> > >
> > > One benefit of this is we skip pages of cgroups that disable
> > > zswap, or
> > > when zswap pool is full.
> >
> > I assume you meant something like the following:
> >
> > bool try_to_sched_async_zswap_store()
> > {
> >         get_obj_cgroup_from_folio()
> >         if (err) goto xxx;
> >         zswap_check_limits();
> >         if (err) goto xxx;
> >         zswap_pool_current_get()
> >         if (err) goto xxx;
> >
> >         queue_folio_to_kcompressd(folio);
>
> Something like this, yeah. Can queue_folio_to_kcompressd() fail? If
> so, we can also try synchronous compression on failure here
> (__zswap_store() ?).
>
>
> >         return true;
> >
> > xxx:
> >         error handler things;
> >         return false;
> > }
> >
> > If this function returns true, it suggests that compression
> > requests
> > have been queued to kcompressd. Following that, in kcompressd():
> >
> > int __zswap_store(folio)
> > {
> >         for(i=0;i<nr_pages;i++) {
> >                 zswap_store_page();
> >                 if (err) return err;
> >         }
> >         return 0;
> > }
> >
> > kcompressd()
> > {
> >         while(folio_queue_is_not_empty) {
> >                 folio = dequeue_folio();
> >                 if (folio_queued_by_zswap(folio)) {
> >                         if(!__zswap_store(folio))
> >                                 continue;
> >                 }
> >                 if ((zswap_store_page_fails &&
> > mem_cgroup_zswap_writeback_enabled()) ||
> >                     folio_queued_by_zram) {
>
> If !mem_cgroup_zswap_writeback_enabled(), I wonder if we can activate
> the page here?
>
> >                         __swap_writepage();
> >                 }
> >         }
> > }
> >
> > In kswapd, we will need to do
> > int swap_writeout(struct folio *folio, struct swap_iocb
> > **swap_plug)
> > {
> >         ...
> >         if (try_to_sched_async_zswap_store(folio))
> >                 return;
> >         if (is_sync_comp_blkdev(swap)) {
> >                 queue_folio_to_kcompressd(folio);
> >                 return;
> >         }
> >         __swap_writepage();
> > }
> >
> > To be honest, I'm not sure if there's a flag that indicates whether
> > the
> > folio was queued by zswap or zram. If not, we may need to add a
> > member
>
> I don't think there is.
>
> > associated with folio pointers in the queue between kswapd and
> > kcompressd,
> > since we need to identify zswap cases. Maybe we can reuse bit 0 of
> > the
> > folio pointer?
> >
> > What I mean is: while queuing, if the folio is queued by zswap, we
> > do
> > `pointer |= BIT(0)`. Then in kcompressd, we restore the original
> > folio
> > with `folio = pointer & ~BIT(0)`. It's a bit ugly, but I’m not sure
> > there’s a better approach.
>
> I think this approach is fine.
>
> We can also hack struct zswap_entry, but that would require an extra
> xarray look up. OTOH, if we can assume that zram users will not
> enable
> zswap, we might optimize that lookup away? Not sure if it's much
> cleaner than just pointer tagging though.
>
> >
> > Thanks
> > Barry

Best regards,
Qun-Wei

Next message: Miklos Szeredi: "Re: [RFC V2 00/18] famfs: port into fuse"
Previous message: Trevor Gross: "Re: [PATCH v3 2/3] rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]