Re: [PATCHv2 3/6] zsmalloc: fine-grained inuse ratio based fullness grouping

From: Minchan Kim
Date: Tue Feb 28 2023 - 17:54:00 EST


On Sun, Feb 26, 2023 at 01:38:22PM +0900, Sergey Senozhatsky wrote:
> On (23/02/23 15:27), Minchan Kim wrote:
> > > + * Pages are distinguished by the ratio of used memory (that is the ratio
> > > + * of ->inuse objects to all objects that page can store). For example,
> > > + * INUSE_RATIO_30 means that the ratio of used objects is > 20% and <= 30%.
> > > + *
> > > + * The number of fullness groups is not random. It allows us to keep
> > > + * diffeence between the least busy page in the group (minimum permitted
> > > + * number of ->inuse objects) and the most busy page (maximum permitted
> > > + * number of ->inuse objects) at a reasonable value.
> > > + */
> > > +#define ZS_INUSE_RATIO_0 0
> >
> > How about keeping ZS_EMPTY and ZS_FULL since they are used
> > multiple places in source code? It would have less churning.
>
> I have to admit that I sort of like the unified naming
> "zspage inuse ratio goes from 0 to 100"
>
> but I can keep ZS_EMPTY / ZS_FULL as two "special" inuse values.
>
> > > +#define ZS_INUSE_RATIO_10 1
> > > +#define ZS_INUSE_RATIO_20 2
> > > +#define ZS_INUSE_RATIO_30 3
> > > +#define ZS_INUSE_RATIO_40 4
> > > +#define ZS_INUSE_RATIO_50 5
> > > +#define ZS_INUSE_RATIO_60 6
> > > +#define ZS_INUSE_RATIO_70 7
> > > +#define ZS_INUSE_RATIO_80 8
> > > +#define ZS_INUSE_RATIO_90 9
> > > +#define ZS_INUSE_RATIO_99 10
> >
> > Do we really need all the define macro for the range from 10 to 99?
> > Can't we do this?
> >
> > enum class_stat_type {
> > ZS_EMPTY,
> > /*
> > * There are fullness buckets between 10% - 99%.
> > */
> > ZS_FULL = 11
> > NR_ZS_FULLNESS,
> > ZS_OBJ_ALLOCATED = NR_ZS_FULLNESS,
> > ZS_OBJ_USED,
> > NR_ZS_STAT,
> > }
>
> This creates undocumented secret constats, which are being heavily
> used (zspage fullness values, indeces in fullness lists arrays,
> stats array offsets, etc.) but have no trace in the code. And this
> also forces us to use magic number in the code. So should fullness
> grouping change, things like, for instance, zs_stat_get(7), will
> compile just fine yet will do something very different and we will
> have someone to spot the regression.
>
> So yes, it's 10 lines of defines, it's not even 10 lines of code, but
> 1) it is documentation, we keep constats documented
> 2) more importantly, it protects us from regressions and bugs
>
> From maintinability point of view, having everything excpliticly
> documented / spelled out is a win.
>
> As of why I decided to go with defines, this is because zspage fullness
> values and class stats are two conceptually different things, they don't
> really fit in one single enum, unless enum's name is "zs_constants".
> What do you think?

Agree. We don't need to combine them, then.
BTW, I still prefer the enum instead of 10 define.

enum fullness_group {
ZS_EMPTY,
ZS_INUSE_RATIO_MIN,
ZS_INUSE_RATIO_ALMOST_FULL = 7,
ZS_INUSE_RATIO_MAX = 10,
ZS_FULL,
NR_ZS_FULLNESS,
}

>
> [..]
> > > * Size of objects stored in this class. Must be multiple
> > > * of ZS_ALIGN.
> > > @@ -641,8 +644,23 @@ static int zs_stats_size_show(struct seq_file *s, void *v)
> > > continue;
> > >
> > > spin_lock(&pool->lock);
> > > - class_almost_full = zs_stat_get(class, ZS_ALMOST_FULL);
> > > - class_almost_empty = zs_stat_get(class, ZS_ALMOST_EMPTY);
> > > +
> > > + /*
> > > + * Replecate old behaviour for almost_full and almost_empty
> > > + * stats.
> > > + */
> > > + class_almost_full = zs_stat_get(class, ZS_INUSE_RATIO_99);
> > > + class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_90);
> > > + class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_80);
> > > + class_almost_full += zs_stat_get(class, ZS_INUSE_RATIO_70);
> >
> > > +
> > > + class_almost_empty = zs_stat_get(class, ZS_INUSE_RATIO_60);
> > > + class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_50);
> > > + class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_40);
> > > + class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_30);
> > > + class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_20);
> > > + class_almost_empty += zs_stat_get(class, ZS_INUSE_RATIO_10);
> >
> > I guess you can use just loop here from 1 to 6
> >
> > And then from 7 to 10 for class_almost_full.
>
> I can change it to
>
> for (r = ZS_INUSE_RATIO_10; r <= ZS_INUSE_RATIO_70; r++)
> and
> for (r = ZS_INUSE_RATIO_80; r <= ZS_INUSE_RATIO_99; r++)
>
> which would be safer than using hard-coded numbers.

I didn't mean to have hard code either but just wanted to show
the intention to use the loop.

>
> Shall we actually instead report per inuse ratio stats instead? I sort
> of don't see too many reasons to keep that below/above 3/4 thing.

Oh, yeah. Since it's debugfs, we would get excuse to break.