Re: [PATCH] zsmalloc: fix linking bug in init_zspage

From: Minchan Kim
Date: Tue Aug 14 2018 - 09:27:03 EST


Hi Sergey,

On Mon, Aug 13, 2018 at 07:55:36PM +0900, Sergey Senozhatsky wrote:
> On (08/13/18 15:05), Minchan Kim wrote:
> > > From: zhouxianrong <zhouxianrong@xxxxxxxxxx>
> > >
> > > The last partial object in last subpage of zspage should not be linked
> > > in allocation list. Otherwise it could trigger BUG_ON explicitly at
> > > function zs_map_object. But it happened rarely.
> >
> > Could you be more specific? What case did you see the problem?
> > Is it a real problem or one founded by review?
> [..]
> > > Signed-off-by: zhouxianrong <zhouxianrong@xxxxxxxxxx>
> > > ---
> > > mm/zsmalloc.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > > index 8d87e973a4f5..24dd8da0aa59 100644
> > > --- a/mm/zsmalloc.c
> > > +++ b/mm/zsmalloc.c
> > > @@ -1040,6 +1040,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
> > > * Reset OBJ_TAG_BITS bit to last link to tell
> > > * whether it's allocated object or not.
> > > */
> > > + if (off > PAGE_SIZE)
> > > + link -= class->size / sizeof(*link);
> > > link->next = -1UL << OBJ_TAG_BITS;
> > > }
> > > kunmap_atomic(vaddr);
>
> Hmm. This can be a real issue. Unless I'm missing something.
>
> So... I might be wrong, but the way I see the bug report is:
>
> When we link objects during zspage init, we do the following:
>
> while ((off += class->size) < PAGE_SIZE) {
> link->next = freeobj++ << OBJ_TAG_BITS;
> link += class->size / sizeof(*link);
> }
>
> Note that we increment the link first, link += class->size / sizeof(*link),
> and check for the offset only afterwards. So by the time we break out of
> the while-loop the link *might* point to the partial object which starts at
> the last page of zspage, but *never* ends, because we don't have next_page
> in current zspage. So that's why that object should not be linked in,
> because it's not a valid allocates object - we simply don't have space
> for it anymore.
>
> zspage [ page 1 ][ page 2 ]
> ...............................link
> [..###]
>
> therefore the last object must be "link - 1" for such cases.
>
> I think, the following change can also do the trick:
>
> while ((off + class->size) < PAGE_SIZE) {
> link->next = freeobj++ << OBJ_TAG_BITS;
> link += class->size / sizeof(*link);
> off += class->size;
> }
>
> Once again, I might be wrong on this.
> Any thoughts?

If we want a refactoring, I'm not against but description said it tiggered
BUG_ON on zs_map_object rarely. That means it should be stable material
and need more description to understand. Please be more specific with
some example. The reason I'm hesitating is zsmalloc moves ZS_FULL group
when the zspage->inuse is equal to class->objs_per_zspage so I thought
it shouldn't allocate last partial object.

Thanks.