Re: [PATCH v3 0/3] cgroup: add xattr support

From: Hugh Dickins
Date: Sun Jul 22 2012 - 15:12:47 EST

On Fri, 20 Jul 2012, Aristeu Rozanski wrote:
> On Wed, Jul 18, 2012 at 06:11:32PM -0700, Hugh Dickins wrote:
> > > But why do we need something completely new? Can't we hijack some
> > > inodes used by tmpfs and use them for xattr storage? ie. Would it be
> > > difficult to use tmpfs as backend storage for on-memory xattr? With
> > > that, we would already have the mechanism and interface(!) for
> > > limiting the size.
> >
> > That sounds just like what I was suggesting in my last sentence:
> > let userspace manage a tmpfs hierarchy parallel to the cgroupfs one.
> >
> > Except, perhaps, where I assume "userspace" should be doing the hard work.
> hm, not sure that's what Tejun meant. tmpfs uses anonymous memory for the file
> contents, so reuse that infrastructure to allocate space for the extended
> attributes the same way, instead of using kmem.
> First thing I can think of is to use whole pages for it to prevent further
> complexity. Shouldn't make much difference considering the usecases we have
> now (systemd and containers), right?

Please, do not do this.

It may be fun to implement, but not to review and maintain.

If we're going to start supporting swappable kernel memory, tmpfs
xattrs is not the right place to start, and libfs xattrs certainly not:
they are a poor fit for swappable memory. (You contemplate using whole
pages above: that will not be very kind to those without swap.)

By all means continue Zefan's work to move xattr support from tmpfs
to libfs (ah, to fs/xattr.c actually, okay), but keep them as kmem.

Support setting and removing user xattrs only if the user has the
appropriate capability (which root will have): looking through the
list of existing capabilities, CAP_IPC_LOCK actually looks appropriate,
although I admit its name certainly does not - it's the "lock down
unlimited amounts of memory" capability.

And support setting and removing user xattrs only if the filesystem
opts in to that: so cgroupfs can opt in, everything else stay out,
and we know where to look when memory goes missing.

Will "lsattr -R" in the cgroupfs mountpoint do enough to judge how
much memory is being used in this way? I expect not, but I'm
unfamliar with it: you may need to show counts elsewhere.

If we keep an eye on those counts as systemd starts to make use of
this feature, perhaps a real case for making this memory swappable
will emerge; but more likely, a case for systemd to be economical
with them - they may be good for storing paths to data blobs, but
I doubt they're good for large blobs.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at