Re: [PATCH] inotify: hide internal kernel bits from fdinfo

From: Eric Paris
Date: Mon Sep 21 2015 - 15:26:14 EST


Acked-by: Eric Paris <eparis@xxxxxxxxxx>

On Mon, 2015-09-21 at 11:45 -0700, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> There was a report that my patch:
>
> inotify: actually check for invalid bits in
> sys_inotify_add_watch()
>
> broke CRIU.
>
> The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/*
> to figure out how to rebuild inotify watches and then passes those
> flags directly back in to the inotify API. One of those flags
> (FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the
> inotify API. It is used inside the kernel to _implement_ inotify
> but it is not and has never been part of the API.
>
> My patch above ensured that we only allow bits which are part of
> the API (IN_ALL_EVENTS). This broke CRIU.
>
> FS_EVENT_ON_CHILD is really internal to the kernel. It is set
> _anyway_ on all inotify marks. So, CRIU was really just trying
> to set a bit that was already set.
>
> This patch hides that bit from fdinfo. CRIU will not see the
> bit, not try to set it, and should work as before. We should not
> have been exposing this bit in the first place, so this is a good
> patch independent of the CRIU problem.
>
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Reported-by: Andrey Wagin <avagin@xxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> Cc: xemul@xxxxxxxxxxxxx
> Cc: Eric Paris <eparis@xxxxxxxxxx>
> Cc: john@xxxxxxxxxxxxxxxxx
> Cc: rlove@xxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> ---
>
> b/fs/notify/fdinfo.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff -puN fs/notify/fdinfo.c~fdinfo-mask fs/notify/fdinfo.c
> --- a/fs/notify/fdinfo.c~fdinfo-mask 2015-09-21
> 10:24:01.031864268 -0700
> +++ b/fs/notify/fdinfo.c 2015-09-21 10:25:04.335723826 -0700
> @@ -82,9 +82,16 @@ static void inotify_fdinfo(struct seq_fi
> inode_mark = container_of(mark, struct inotify_inode_mark,
> fsn_mark);
> inode = igrab(mark->inode);
> if (inode) {
> + /*
> + * IN_ALL_EVENTS represents all of the mask bits
> + * that we expose to userspace. There is at
> + * least one bit (FS_EVENT_ON_CHILD) which is
> + * used only internally to the kernel.
> + */
> + u32 mask = mark->mask & IN_ALL_EVENTS;
> seq_printf(m, "inotify wd:%x ino:%lx sdev:%x mask:%x
> ignored_mask:%x ",
> inode_mark->wd, inode->i_ino, inode->i_sb
> ->s_dev,
> - mark->mask, mark->ignored_mask);
> + mask, mark->ignored_mask);
> show_mark_fhandle(m, inode);
> seq_putc(m, '\n');
> iput(inode);
> _
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/