Re: WARNING in task_participate_group_stop

From: Dmitry Vyukov
Date: Mon Nov 02 2015 - 09:22:11 EST


On Mon, Nov 2, 2015 at 4:13 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> Hi Dmitry,
>
> On 11/02, Dmitry Vyukov wrote:
>>
>> WARNING: CPU: 1 PID: 1 at kernel/signal.c:334
>> task_participate_group_stop+0x157/0x1d0()
>> Modules linked in:
>> CPU: 1 PID: 1 Comm: init Not tainted 4.3.0 #48
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> ffffffff82e40280 ffff88003eb0fae0 ffffffff819efe55 0000000000000000
>> ffff88003eb0fb20 ffffffff810ec871 ffffffff8110f4d7 ffff88003eb00000
>> ffff88003eb20000 0000000000000000 ffff88003eb0fbf8 ffff88003eb20000
>> Call Trace:
>> [<ffffffff810eca35>] warn_slowpath_null+0x15/0x20 kernel/panic.c:480
>> [<ffffffff8110f4d7>] task_participate_group_stop+0x157/0x1d0
>> kernel/signal.c:334
>> [<ffffffff81113587>] do_signal_stop+0x1e7/0x6e0 kernel/signal.c:2060
>> [<ffffffff81116ab7>] get_signal+0x387/0x11b0 kernel/signal.c:2316
>> [<ffffffff8100cf0d>] do_signal+0x8d/0x19e0 arch/x86/kernel/signal.c:707
>> [<ffffffff81005d8d>] prepare_exit_to_usermode+0x11d/0x170
>> arch/x86/entry/common.c:251
>> [<ffffffff81005e83>] syscall_return_slowpath+0xa3/0x2b0
>> arch/x86/entry/common.c:317
>> [<ffffffff82d4f6a7>] int_ret_from_sys_call+0x25/0x8f
>> arch/x86/entry/entry_64.S:281
>> ---[ end trace f6697fd630b7c361 ]---
>>
>>
>> The reproducer is (needs to be run as root):
>>
>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>> #include <sys/ptrace.h>
>> #include <unistd.h>
>>
>> int main()
>> {
>> int pid = 1;
>> ptrace(PTRACE_ATTACH, pid, 0, 0);
>> ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL);
>> sleep(1);
>> return 0;
>> }
>
> Thanks.
>
> Can't reproduce, but at first glance the problem looks clear...

Humm... did you run as root?
It reproduces all the time on my 4.3 kernel VM. Also firmly killed my
desktop running 3.13.



>> Yes, it is weird and it kills init right afterwards.
>
> Could you confirm that this WARN_ON() happens _after_ the reproducer exits?
>
>> But I wasn't able
>> to figure out what's the root cause (why task does not have
>> JOBCTL_STOP_PENDING) and maybe the same WARNING can be triggered
>> without root and/or with other than init process. So still posting it
>> here.
>
> Yes I think you are right. SIGSTOP can race with SIGKILL which (unlike SIGCONT)
> doesn't clear JOBCTL_STOP_DEQUEUED/PENDING/etc.
>
> This is mostly fine, the task won't block in TASK_STOPPED if SIGKILL is pending,
> but still is not right and leads to the warning above: JOBCTL_STOP_PENDING was not
> set because do_signal_stop()->task_set_jobctl_pending() checks fatal_signal_pending().
>
> Probably the patch below should fix the problem, but I'd like to think more before
> I send the fix.


Will test it.


> Oleg.
>
> --- x/kernel/signal.c
> +++ x/kernel/signal.c
> @@ -2002,7 +2002,7 @@ static bool do_signal_stop(int signr)
> WARN_ON_ONCE(signr & ~JOBCTL_STOP_SIGMASK);
>
> if (!likely(current->jobctl & JOBCTL_STOP_DEQUEUED) ||
> - unlikely(signal_group_exit(sig)))
> + unlikely(fatal_signal_pending(current)))
> return false;
> /*
> * There is no group stop already in progress. We must
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/