Re: [PATCH 13/13] xfs: convert xfs_sysfs attrs to use ->seq_show

From: Greg Kroah-Hartman
Date: Tue Sep 14 2021 - 01:13:10 EST


On Tue, Sep 14, 2021 at 11:20:29AM +1000, Dave Chinner wrote:
> On Mon, Sep 13, 2021 at 08:27:50AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Sep 13, 2021 at 07:41:21AM +0200, Christoph Hellwig wrote:
> > > Trivial conversion to the seq_file based sysfs attributes.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> > > ---
> > > fs/xfs/xfs_stats.c | 24 +++++-------
> > > fs/xfs/xfs_stats.h | 2 +-
> > > fs/xfs/xfs_sysfs.c | 96 +++++++++++++++++++++++-----------------------
> > > 3 files changed, 58 insertions(+), 64 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
> > > index 20e0534a772c9..71e7a84ba0403 100644
> > > --- a/fs/xfs/xfs_stats.c
> > > +++ b/fs/xfs/xfs_stats.c
> > > @@ -16,10 +16,9 @@ static int counter_val(struct xfsstats __percpu *stats, int idx)
> > > return val;
> > > }
> > >
> > > -int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > > +void xfs_stats_format(struct xfsstats __percpu *stats, struct seq_file *sf)
> > > {
> > > int i, j;
> > > - int len = 0;
> > > uint64_t xs_xstrat_bytes = 0;
> > > uint64_t xs_write_bytes = 0;
> > > uint64_t xs_read_bytes = 0;
> > > @@ -58,13 +57,12 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > > /* Loop over all stats groups */
> > >
> > > for (i = j = 0; i < ARRAY_SIZE(xstats); i++) {
> > > - len += scnprintf(buf + len, PATH_MAX - len, "%s",
> > > - xstats[i].desc);
> > > + seq_printf(sf, "%s", xstats[i].desc);
> > > +
> > > /* inner loop does each group */
> > > for (; j < xstats[i].endpoint; j++)
> > > - len += scnprintf(buf + len, PATH_MAX - len, " %u",
> > > - counter_val(stats, j));
> > > - len += scnprintf(buf + len, PATH_MAX - len, "\n");
> > > + seq_printf(sf, " %u", counter_val(stats, j));
> > > + seq_printf(sf, "\n");
> > > }
> > > /* extra precision counters */
> > > for_each_possible_cpu(i) {
> > > @@ -74,18 +72,14 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > > defer_relog += per_cpu_ptr(stats, i)->s.defer_relog;
> > > }
> > >
> > > - len += scnprintf(buf + len, PATH_MAX-len, "xpc %Lu %Lu %Lu\n",
> > > + seq_printf(sf, "xpc %Lu %Lu %Lu\n",
> > > xs_xstrat_bytes, xs_write_bytes, xs_read_bytes);
> > > - len += scnprintf(buf + len, PATH_MAX-len, "defer_relog %llu\n",
> > > - defer_relog);
> > > - len += scnprintf(buf + len, PATH_MAX-len, "debug %u\n",
> > > + seq_printf(sf, "defer_relog %llu\n", defer_relog);
> > > #if defined(DEBUG)
> > > - 1);
> > > + seq_printf(sf, "debug 1\n");
> > > #else
> > > - 0);
> > > + seq_printf(sf, "debug 0\n");
> > > #endif
> > > -
> > > - return len;
> > > }
> >
> > That is a sysfs file? What happened to the "one value per file" rule
> > here?
>
>
> There is no "rule" that says syfs files must contain one value per
> file; the documentation says that one value per file is the
> "preferred" format. Documentation/filesystems/sysfs.rst:
>
> [...]
> Attributes
> ...
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only one
> value per file, so it is socially acceptable to express an array of
> values of the same type.
> [...]
>

An array of values is one thing like "what is the power states for this
device". A list of different key/value pairs is a totally different
thing entirely.

> We are exposing a large array of integer values here, so multiple
> values per file are explicitly considered an acceptible format.

Not really, that was not the goal of sysfs at all.

> Further, as there are roughly 200 individual stats in this file and
> calculating each stat requires per-cpu aggregation, the the cost of
> calculating and reading each stat individually is prohibitive, not
> just inefficient.

Have you measured it? How often does the file get read and by what
tools?

We have learned from our past mistakes in /proc where we did this in the
past and required keeping obsolete values and constantly tweaking
userspace parsers. That is why we made sysfs one-value-per-file. If
the file is not there, the value is not there, much easier to handle
future changes.

> So, yes, we might have multiple lines in the file that you can frown
> about, but OTOH the file format has been exposed as a kernel ABI for
> a couple of decades via /proc/fs/xfs/stat.

proc had no such rules, but we have learned :)

> Hence exposing it in
> sysfs to provide a more fine-grained breakdown of the stats (per
> mount instead of global) is a no-brainer. We don't have to rewrite
> the parsing engines in multiple userspace monitoring programs to
> extract this information from the kernel - they just create a new
> instance and read a different file and it all just works.

But then you run into the max size restriction on sysfs files
(PAGE_SIZE) and things break down.

Please don't do this.

> Indeed, there's precedence for such /proc file formats in more
> fine-grained sysfs files. e.g. /sys/bus/node/devices/node<n>/vmstat
> and /sys/bus/node/devices/node<n>/meminfo retain the same format
> (and hence userspace parsers) for the per-node stats as /proc/vmstat
> and /proc/meminfo use for the global stats...

And I have complained about those files in the past many times. And
they are running into problems in places dealing with them too.

> tl;dr: the file contains arrays of values, it's inefficient to read
> values one at a time, it's a pre-existing ABI-constrainted file
> format, there's precedence in core kernel statistics
> implementations and the documented guidelines allow this sort of
> usage in these cases.

I would prefer not to do this, and I will not take core sysfs changes to
make this any easier.

Which is one big reason why I don't like just making sysfs use the seq
file api, it would allow stuff like this to propagate to other places in
the kernel.

Maybe I should cut the file size of a sysfs file down to PAGE_SIZE/4 or
less, that might be better :)

thanks,

greg k-h