Re: suppress page allocation failure warnings from sys_listxattr

From: Dave Chinner
Date: Wed Mar 28 2012 - 00:39:59 EST


On Tue, Mar 27, 2012 at 08:15:50PM -0400, Dave Jones wrote:
> On Tue, Mar 27, 2012 at 03:51:49PM -0700, Andrew Morton wrote:
> > On Tue, 13 Mar 2012 14:22:20 -0400
> > Dave Jones <davej@xxxxxxxxxx> wrote:
> >
> > > This size is user controllable, and so it's trivial for someone to trigger a
> > > stream of order:4 page allocation errors.
> > >
> > > Signed-off-by: Dave Jones <davej@xxxxxxxxxx>
> > >
> > > ---
> > > There's also a similar problem in setxattr, but I'm not sure how we want
> > > to pass NOWARN down to memdup_user. Thoughts ?
> > >
> > > diff --git a/fs/xattr.c b/fs/xattr.c
> > > index 82f4337..544df90 100644
> > > --- a/fs/xattr.c
> > > +++ b/fs/xattr.c
> > > @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
> > > if (size) {
> > > if (size > XATTR_LIST_MAX)
> > > size = XATTR_LIST_MAX;
> > > - klist = kmalloc(size, GFP_KERNEL);
> > > + klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL);
> > > if (!klist)
> > > return -ENOMEM;
> > > }
> >
> > hm. The patch is good, but one would hope that it isn't "trivial" to
> > trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) -
> > the VM is supposed to be able to handle that.
> >
> > Is it really *that* easy, or is Something Unusual happening with that
> > machine?
>
> Well, the unusual thing was that I was fuzzing system calls for a few hours.
>
> My fuzzing tool was able to trigger these very easily after an hour or two
> of uptime and memory had fragmented a little, so yeah, quite trivial.

We've recently been seeing reports of xfsdump trigging a similar
allocation failures in the XFS attr code when we are doing hundreds
of thousands of attribute lookups to back them up.

ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get

I think that falling back to vmalloc here is much better solution
than failing to retreive the attribute - it will work no matter how
fragmented memory gets. That means we don't get incomplete
backups occurring after days or months of uptime and successful
backups...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/