Re: [PATCH -mm v2 1/3] mm/oom_kill: remove the wrong fatal_signal_pending() check in oom_kill_process()

From: Tetsuo Handa
Date: Fri Oct 02 2015 - 10:08:27 EST


Michal Hocko wrote:
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -295,6 +295,8 @@ static int zap_process(struct task_struct *start, int exit_code, int flags)
> > for_each_thread(start, t) {
> > task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
> > if (t != current && t->mm) {
> > + printk(KERN_INFO "Setting SIGKILL to %s(%u)\n",
> > + t->comm, t->pid);
> > sigaddset(&t->pending.signal, SIGKILL);
> > signal_wake_up(t, 1);
> > nr++;
> > ---------- debug printk() patch end ----------
>
> OK, but all your tasks should trigger SEGV. You cannot find out whether
> all of them happened from the same zap_process, can you.

Indeed. I retested with updated patch. Not all of them are killed from
the same zap_process, but all of them are killed by the same SEGV event.
I think that coredump stops all threads sharing the same memory.

---------- debug printk() patch start ----------
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -292,14 +292,18 @@ static int zap_process(struct task_struct *start, int exit_code, int flags)
start->signal->group_exit_code = exit_code;
start->signal->group_stop_count = 0;

+ printk(KERN_INFO "%s(%d): Started zap_process()\n", current->comm, current->pid);
for_each_thread(start, t) {
task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
if (t != current && t->mm) {
+ printk(KERN_INFO "%s(%d): Setting SIGKILL to %s(%u)\n",
+ current->comm, current->pid, t->comm, t->pid);
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);
nr++;
}
}
+ printk(KERN_INFO "%s(%d): zap_process() returned %d\n", current->comm, current->pid, nr);

return nr;
}
---------- debug printk() patch end ----------

---------- kernel log start ----------
[ 71.808316] a.out[11057]: segfault at 7ac768 ip 00000000004007be sp 00000000007ac770 error 6 in a.out[400000+1000]
[ 71.811003] a.out[11058]: segfault at 7ad778 ip 00000000004007be sp 00000000007ad780 error 6
[ 71.811005] in a.out[400000+1000]
[ 71.813817] a.out(11058): Started zap_process()
[ 71.813818] a.out(11058): Setting SIGKILL to a.out(11056)
[ 71.813841] a.out(11058): Setting SIGKILL to a.out(11057)
[ 71.813854] a.out(11058): Setting SIGKILL to a.out(11059)
[ 71.813855] a.out(11058): Setting SIGKILL to a.out(11060)
[ 71.813855] a.out(11058): Setting SIGKILL to a.out(11061)
[ 71.813857] a.out(11058): zap_process() returned 5
[ 71.813880] a.out(11058): Started zap_process()
[ 71.813880] a.out(11058): Setting SIGKILL to a.out(11062)
[ 71.813881] a.out(11058): zap_process() returned 1
[ 71.813881] a.out(11058): Started zap_process()
[ 71.813882] a.out(11058): Setting SIGKILL to a.out(11063)
[ 71.813882] a.out(11058): zap_process() returned 1
[ 71.813882] a.out(11058): Started zap_process()
[ 71.813883] a.out(11058): Setting SIGKILL to a.out(11064)
[ 71.813883] a.out(11058): zap_process() returned 1
[ 71.813884] a.out(11058): Started zap_process()
[ 71.813884] a.out(11058): Setting SIGKILL to a.out(11065)
[ 71.813899] a.out(11058): zap_process() returned 1
[ 71.813900] a.out(11058): Started zap_process()
[ 71.813900] a.out(11058): Setting SIGKILL to a.out(11066)
[ 71.813901] a.out(11058): zap_process() returned 1
[ 71.813901] a.out(11058): Started zap_process()
[ 71.813902] a.out(11058): Setting SIGKILL to a.out(11067)
[ 71.813902] a.out(11058): zap_process() returned 1
[ 71.813903] a.out(11058): Started zap_process()
[ 71.813904] a.out(11058): Setting SIGKILL to a.out(11068)
[ 71.813904] a.out(11058): zap_process() returned 1
[ 71.813905] a.out(11058): Started zap_process()
[ 71.813905] a.out(11058): Setting SIGKILL to a.out(11069)
[ 71.813906] a.out(11058): zap_process() returned 1
[ 71.813906] a.out(11058): Started zap_process()
[ 71.813907] a.out(11058): Setting SIGKILL to a.out(11070)
[ 71.813907] a.out(11058): zap_process() returned 1
[ 71.813908] a.out(11058): Started zap_process()
[ 71.813908] a.out(11058): Setting SIGKILL to a.out(11071)
[ 71.813908] a.out(11058): zap_process() returned 1
[ 71.813925] a.out[11068]: segfault at 7b7818 ip 00000000004007be sp 00000000007b7820 error 6 in a.out[400000+1000]
[ 71.813938] a.out[11063]: segfault at 7b27c8 ip 00000000004007be sp 00000000007b27d0 error 6
[ 71.813940] a.out[11064]: segfault at 7b37d8 ip 00000000004007be sp 00000000007b37e0 error 6
[ 71.813941] a.out[11066]: segfault at 7b57f8 ip 00000000004007be sp 00000000007b5800 error 6
[ 71.813943] a.out[11060]: segfault at 7af798 ip 00000000004007be sp 00000000007af7a0 error 6
[ 71.813945] a.out[11062]: segfault at 7b17b8 ip 00000000004007be sp 00000000007b17c0 error 6
[ 71.813946] a.out[11067]: segfault at 7b6808 ip 00000000004007be sp 00000000007b6810 error 6
[ 71.813994] a.out[11070]: segfault at 7b9838 ip 00000000004007be sp 00000000007b9840 error 6
[ 71.813995] in a.out[400000+1000]
[ 71.813998] in a.out[400000+1000]
[ 71.814002] in a.out[400000+1000]
[ 71.814004] in a.out[400000+1000]
[ 71.814008] in a.out[400000+1000]
[ 71.814011] in a.out[400000+1000]
[ 71.814015] in a.out[400000+1000]
---------- kernel log end ----------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/