Re: [PATCH] fanotify: to differ file access event from differentthreads

From: Mihai DonÈu
Date: Mon Feb 11 2013 - 14:56:42 EST


On Tue, 6 Dec 2011 09:23:25 +0800 boyd yang wrote:
> fanotify: to differ file access event from different threads
> When fanotify is monitoring the whole mount point "/", and multiple
> threads iterate the same direcotry, some thread will hang.
> This patch let fanotify to differ access events from different
> threads, prevent fanotify from merging access events from different
> threads.
> It also hide overflow events to reach user space.
> Signed-off-by: Boyd Yang <boyd.yang@xxxxxxxxx>
>
> diff -r -u linux-3.1-rc4_orig/fs/notify/fanotify/fanotify.c
> linux-3.1-rc4/fs/notify/fanotify/fanotify.c
> --- linux-3.1-rc4_orig/fs/notify/fanotify/fanotify.c 2011-08-29
> 12:16:01.000000000 +0800
> +++ linux-3.1-rc4/fs/notify/fanotify/fanotify.c 2011-10-14
> 14:17:53.055958000 +0800
> @@ -15,7 +15,8 @@
>
> if (old->to_tell == new->to_tell &&
> old->data_type == new->data_type &&
> - old->tgid == new->tgid) {
> + old->tgid == new->tgid &&
> + old->pid == new->pid) {
> switch (old->data_type) {
> case (FSNOTIFY_EVENT_PATH):
> if ((old->path.mnt == new->path.mnt) &&
> @@ -144,11 +145,16 @@
> return PTR_ERR(notify_event);
>
> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> - if (event->mask & FAN_ALL_PERM_EVENTS) {
> - /* if we merged we need to wait on the new event */
> - if (notify_event)
> - event = notify_event;
> - ret = fanotify_get_response_from_access(group,
> event);
> + /*if overflow, do not wait for response*/
> + if (event->mask&FS_Q_OVERFLOW) {
> + pr_debug("fanotify overflow!\n");
> + } else {
> + if (event->mask & FAN_ALL_PERM_EVENTS) {
> + /* if we merged we need to wait on the new
> event */
> + if (notify_event)
> + event = notify_event;
> + ret =
> fanotify_get_response_from_access(group, event);
> + }
> }
> #endif
>
> diff -r -u linux-3.1-rc4_orig/fs/notify/notification.c
> linux-3.1-rc4/fs/notify/notification.c
> --- linux-3.1-rc4_orig/fs/notify/notification.c 2011-08-29
> 12:16:01.000000000 +0800
> +++ linux-3.1-rc4/fs/notify/notification.c 2011-10-14
> 13:52:36.946608000 +0800 @@ -95,6 +95,7 @@
> BUG_ON(!list_empty(&event->private_data_list));
>
> kfree(event->file_name);
> + put_pid(event->pid);
> put_pid(event->tgid);
> kmem_cache_free(fsnotify_event_cachep, event);
> }
> @@ -374,6 +375,7 @@
> return NULL;
> }
> }
> + event->pid = get_pid(old_event->pid);
> event->tgid = get_pid(old_event->tgid);
> if (event->data_type == FSNOTIFY_EVENT_PATH)
> path_get(&event->path);
> @@ -417,6 +419,7 @@
> event->name_len = strlen(event->file_name);
> }
>
> + event->pid = get_pid(task_pid(current));
> event->tgid = get_pid(task_tgid(current));
> event->sync_cookie = cookie;
> event->to_tell = to_tell;
> diff -r -u linux-3.1-rc4_orig/include/linux/fsnotify_backend.h
> linux-3.1-rc4/include/linux/fsnotify_backend.h
> --- linux-3.1-rc4_orig/include/linux/fsnotify_backend.h
> 2011-08-29 12:16:01.000000000 +0800
> +++ linux-3.1-rc4/include/linux/fsnotify_backend.h 2011-10-14
> 13:51:50.380168000 +0800
> @@ -238,6 +238,7 @@
> u32 sync_cookie; /* used to corrolate events, namely
> inotify mv events */ const unsigned char *file_name;
> size_t name_len;
> + struct pid *pid;
> struct pid *tgid;
>
> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>

This patch triggers the following on my 3.7.6 kernel:

INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0,
t=15002 jiffies) sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU 0
Modules linked in: ext2 ppdev parport_pc mac_hid psmouse serio_raw
i2c_piix4 lp parport 8139too floppy 8139cp

Pid: 0, comm: swapper/0 Not tainted 3.2.35 #12 Bochs Bochs
RIP: 0010:[<ffffffff81037bdf>] [<ffffffff81037bdf>]
flat_send_IPI_all+0xaf/0xd0 RSP: 0018:ffff88003fc03d88 EFLAGS: 00010006
RAX: 0000000000000000 RBX: 0000000000000046 RCX: 0000000000000050
RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000300
RBP: ffff88003fc03da8 R08: 000000000000000a R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000c00
R13: 0000000003000000 R14: 0000000000000001 R15: ffffffff81c32d00
FS: 0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fb5d3100000 CR3: 000000001ca06000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task
ffffffff81c0d020) Stack:
0000000000000000 0000000000002710 ffffffff81c31c00 ffffffff81c31d00
ffff88003fc03dc8 ffffffff81033231 000000000000000a ffff88003fc0ec40
ffff88003fc03e18 ffffffff810defbe ffff880000000001 ffffffff81c32d00
Call Trace:
<IRQ>
[<ffffffff81033231>] arch_trigger_all_cpu_backtrace+0x61/0xa0
[<ffffffff810defbe>] __rcu_pending+0x3ae/0x420
[<ffffffff810df329>] rcu_check_callbacks+0x79/0x1e0
[<ffffffff81078068>] update_process_times+0x48/0x90
[<ffffffff8109b4f4>] tick_sched_timer+0x64/0xc0
[<ffffffff8108df56>] __run_hrtimer+0x76/0x1f0
[<ffffffff8109b490>] ? tick_nohz_handler+0x100/0x100
[<ffffffff8108e907>] hrtimer_interrupt+0xf7/0x230
[<ffffffff81650409>] smp_apic_timer_interrupt+0x69/0x99
[<ffffffff8164e2de>] apic_timer_interrupt+0x6e/0x80
<EOI>
[<ffffffff810904a5>] ? sched_clock_local+0x25/0x90
[<ffffffff8103cedb>] ? native_safe_halt+0xb/0x10
[<ffffffff8101c6b3>] default_idle+0x53/0x1d0
[<ffffffff81013236>] cpu_idle+0xd6/0x120
[<ffffffff816172ce>] rest_init+0x72/0x74
[<ffffffff81cfcba5>] start_kernel+0x3b0/0x3bd
[<ffffffff81cfc347>] x86_64_start_reservations+0x132/0x136
[<ffffffff81cfc140>] ? early_idt_handlers+0x140/0x140
[<ffffffff81cfc44d>] x86_64_start_kernel+0x102/0x111
[...]

It happens after my application runs for half an hour or so. However, I
don't see how this could possibly solve the problem I've observed: due
to a race, a kernel thread ends up doing wait_event() on an event which
soon after is merged by a different thread into a new one which becomes
the actual event to be "received" by the content introspection
application. It's easily reproducible with a simple script:

$ while true; do cp -f /root/eicar.com /root/watched-dir; done

all the while the fanotify application does a re-open (RD -> RDWR) and
truncate(0), on multiple threads.

(I do a fanotify_init(O_RDONLY) because of surprise ETXTBSY)

Anyway, regardless of how I use the API the race needs to be
eliminated somehow. So my problem now is: how do I switch all
wait_event()-users to the new event created by fanotify_merge()?

--
Mihai DonÈu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/