Re: [PATCH 04/16] erofs: introduce bufvec to store decompressed buffers

From: Gao Xiang
Date: Fri Jul 15 2022 - 02:36:21 EST


Hi Yue,

On Fri, Jul 15, 2022 at 02:29:30PM +0800, Yue Hu wrote:
> On Thu, 14 Jul 2022 21:20:39 +0800
> Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:
>
> > For each pcluster, the total compressed buffers are determined in
> > advance, yet the number of decompressed buffers actually vary. Too
> > many decompressed pages can be recorded if one pcluster is highly
> > compressed or its pcluster size is large. That takes extra memory
> > footprints compared to uncompressed filesystems, especially a lot of
> > I/O in flight on low-ended devices.
> >
> > Therefore, similar to inplace I/O, pagevec was introduced to reuse
> > page cache to store these pointers in the time-sharing way since
> > these pages are actually unused before decompressing.
> >
> > In order to make it more flexable, a cleaner bufvec is used to
> > replace the old pagevec stuffs so that
> >
> > - Decompressed offsets can be stored inline, thus it can be used
> > for the upcoming feature like compressed data deduplication;
> >
> > - Towards supporting large folios for compressed inodes since
> > our final goal is to completely avoid page->private but use
> > folio->private only for all page cache pages.
> >
> > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx>
> > ---
> > fs/erofs/zdata.c | 177 +++++++++++++++++++++++++++++++++++------------
> > fs/erofs/zdata.h | 26 +++++--
> > 2 files changed, 153 insertions(+), 50 deletions(-)
> >
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index c183cd0bc42b..f52c54058f31 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -2,6 +2,7 @@
> > /*
> > * Copyright (C) 2018 HUAWEI, Inc.
> > * https://www.huawei.com/
> > + * Copyright (C) 2022 Alibaba Cloud
> > */
> > #include "zdata.h"
> > #include "compress.h"
> > @@ -26,6 +27,82 @@ static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
> > _PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
> > };
> >
> > +struct z_erofs_bvec_iter {
> > + struct page *bvpage;
> > + struct z_erofs_bvset *bvset;
> > + unsigned int nr, cur;
> > +};
> > +
> > +static struct page *z_erofs_bvec_iter_end(struct z_erofs_bvec_iter *iter)
> > +{
> > + if (iter->bvpage)
> > + kunmap_local(iter->bvset);
> > + return iter->bvpage;
> > +}
> > +
> > +static struct page *z_erofs_bvset_flip(struct z_erofs_bvec_iter *iter)
> > +{
> > + unsigned long base = (unsigned long)((struct z_erofs_bvset *)0)->bvec;
> > + /* have to access nextpage in advance, otherwise it will be unmapped */
> > + struct page *nextpage = iter->bvset->nextpage;
> > + struct page *oldpage;
> > +
> > + DBG_BUGON(!nextpage);
> > + oldpage = z_erofs_bvec_iter_end(iter);
> > + iter->bvpage = nextpage;
> > + iter->bvset = kmap_local_page(nextpage);
> > + iter->nr = (PAGE_SIZE - base) / sizeof(struct z_erofs_bvec);
> > + iter->cur = 0;
> > + return oldpage;
> > +}
> > +
> > +static void z_erofs_bvec_iter_begin(struct z_erofs_bvec_iter *iter,
> > + struct z_erofs_bvset_inline *bvset,
> > + unsigned int bootstrap_nr,
> > + unsigned int cur)
> > +{
> > + *iter = (struct z_erofs_bvec_iter) {
> > + .nr = bootstrap_nr,
> > + .bvset = (struct z_erofs_bvset *)bvset,
> > + };
> > +
> > + while (cur > iter->nr) {
> > + cur -= iter->nr;
> > + z_erofs_bvset_flip(iter);
> > + }
> > + iter->cur = cur;
> > +}
> > +
> > +static int z_erofs_bvec_enqueue(struct z_erofs_bvec_iter *iter,
> > + struct z_erofs_bvec *bvec,
> > + struct page **candidate_bvpage)
> > +{
> > + if (iter->cur == iter->nr) {
> > + if (!*candidate_bvpage)
> > + return -EAGAIN;
> > +
> > + DBG_BUGON(iter->bvset->nextpage);
> > + iter->bvset->nextpage = *candidate_bvpage;
> > + z_erofs_bvset_flip(iter);
> > +
> > + iter->bvset->nextpage = NULL;
> > + *candidate_bvpage = NULL;
> > + }
> > + iter->bvset->bvec[iter->cur++] = *bvec;
> > + return 0;
> > +}
> > +
> > +static void z_erofs_bvec_dequeue(struct z_erofs_bvec_iter *iter,
> > + struct z_erofs_bvec *bvec,
> > + struct page **old_bvpage)
> > +{
> > + if (iter->cur == iter->nr)
> > + *old_bvpage = z_erofs_bvset_flip(iter);
> > + else
> > + *old_bvpage = NULL;
> > + *bvec = iter->bvset->bvec[iter->cur++];
> > +}
> > +
>
> Touch a new file to include bufvec related code? call it zbvec.c/h?

Thanks for the suggestion. The new implementation is simple enough,
so I tend to directly leave it in zdata.c instead of making more
churn to add more new files and zpvec.h will be completely removed
in the following patch.

Thanks,
Gao Xiang