Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/

From: Brian Foster
Date: Thu Jun 30 2022 - 07:49:07 EST


On Wed, Jun 29, 2022 at 01:43:11PM -0700, Kalesh Singh wrote:
> On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@xxxxxxxxxx> wrote:
> >
> > On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > > To be able to account the amount of memory a process is keeping pinned
> > > > > by open file descriptors add a 'size' field to fdinfo output.
> > > > >
> > > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > > and make it a common field for all fds. This allows tracking of
> > > > > other types of memory (e.g. memfd and ashmem in Android).
> > > > >
> > > > > Signed-off-by: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
> > > > > Reviewed-by: Christian König <christian.koenig@xxxxxxx>
> > > > > ---
> > > > >
> > > > > Changes in v2:
> > > > > - Add Christian's Reviewed-by
> > > > >
> > > > > Changes from rfc:
> > > > > - Split adding 'size' and 'path' into a separate patches, per Christian
> > > > > - Split fdinfo seq_printf into separate lines, per Christian
> > > > > - Fix indentation (use tabs) in documentaion, per Randy
> > > > >
> > > > > Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > > > drivers/dma-buf/dma-buf.c | 1 -
> > > > > fs/proc/fd.c | 9 +++++----
> > > > > 3 files changed, 15 insertions(+), 7 deletions(-)
> > > > >
> > ...
> > > >
> > > > Also not sure if it matters that much for your use case, but something
> > > > worth noting at least with shmem is that one can do something like:
> > > >
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem: 764 kB
> > > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > > # ls -alh file
> > > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > > # stat file
> > > > File: file
> > > > Size: 0 Blocks: 20480 IO Block: 4096 regular empty file
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem: 11004 kB
> > > >
> > > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > > is in i_blocks/bytes).
> > >
> > > I tried a similar experiment a few times, but I don't see the same
> > > results. In my case, there is not any change in shmem. IIUC the
> > > fallocate is allocating the disk space not shared memory.
> > >
> >
> > Sorry, it was implied in my previous test was that I was running against
> > tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> > both cases is as expected: the underlying blocks are allocated and the
> > inode size is unchanged.
> >
> > What wasn't totally clear to me when I read this patch was 1. whether
> > tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> > operation. The test above seems to confirm both, however, right? E.g., a
> > more detailed example:
> >
> > # mount | grep /tmp
> > tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> > # cat /proc/meminfo | grep Shmem:
> > Shmem: 5300 kB
> > # xfs_io -fc "falloc -k 0 1g" /tmp/file
> > # stat /tmp/file
> > File: /tmp/file
> > Size: 0 Blocks: 2097152 IO Block: 4096 regular empty file
> > Device: 22h/34d Inode: 45 Links: 1
> > Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
> > Context: unconfined_u:object_r:user_tmp_t:s0
> > Access: 2022-06-29 08:04:01.301307154 -0400
> > Modify: 2022-06-29 08:04:01.301307154 -0400
> > Change: 2022-06-29 08:04:01.451312834 -0400
> > Birth: 2022-06-29 08:04:01.301307154 -0400
> > # cat /proc/meminfo | grep Shmem:
> > Shmem: 1053876 kB
> > # rm -f /tmp/file
> > # cat /proc/meminfo | grep Shmem:
> > Shmem: 5300 kB
> >
> > So clearly this impacts Shmem.. was your test run against tmpfs or some
> > other (disk based) fs?
>
> Hi Brian,
>
> Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:
>
> ==> meminfo.start <==
> Shmem: 572 kB
> ==> meminfo.stop <==
> Shmem: 51688 kB
>

Ok, makes sense.

> >
> > FWIW, I don't have any objection to exposing inode size if it's commonly
> > useful information. My feedback was more just an fyi that i_size doesn't
> > necessarily reflect underlying space consumption (whether it's memory or
> > disk space) in more generic cases, because it sounds like that is really
> > what you're after here. The opposite example to the above would be
> > something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> > 1TB inode size with zero additional shmem usage.
>
> From these cases, it seems the more generic way to do this is by
> calculating the actual size consumed using the blocks. (i_blocks *
> 512). So in the latter example 'xfs_io -fc "truncate 1t" /tmp/file'
> the size consumed would be zero. Let me know if it sounds ok to you
> and I can repost the updated version.
>

That sounds a bit more useful to me if you're interested in space usage,
or at least I don't have a better idea for you. ;)

One thing to note is that I'm not sure whether all fs' use i_blocks
reliably. E.g., XFS populates stat->blocks via a separate block counter
in the XFS specific inode structure (see xfs_vn_getattr()). A bunch of
other fs' seem to touch it so perhaps that is just an outlier. You could
consider fixing that up, perhaps make a ->getattr() call to avoid it, or
just use the field directly if it's useful enough as is and there are no
other objections. Something to think about anyways..

Brian

> Thanks,
> Kalesh
>
> >
> > Brian
> >
> > > cat /proc/meminfo > meminfo.start
> > > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > > cat /proc/meminfo > meminfo.stop
> > > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> > >
> > > ==> meminfo.start <==
> > > Shmem: 484 kB
> > > ==> meminfo.stop <==
> > > Shmem: 484 kB
> > >
> > > ls -lh xfs_file
> > > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> > >
> > > stat xfs_file
> > > File: xfs_file
> > > Size: 0 Blocks: 102400 IO Block: 4096 regular empty file
> > >
> > > Thanks,
> > > Kalesh
> > >
> > > >
> > > > Brian
> > > >
> > > > >
> > > > > /* show_fd_locks() never deferences files so a stale value is safe */
> > > > > show_fd_locks(m, file, files);
> > > > > --
> > > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > > >
> > > >
> > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx.
> >
>