Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case

From: Dmitry Osipenko
Date: Tue Jan 18 2022 - 13:02:06 EST


18.01.2022 20:52, Eric W. Biederman пишет:
> Dmitry Osipenko <digetx@xxxxxxxxx> writes:
>
>> 11.01.2022 20:20, Eric W. Biederman пишет:
>>> Dmitry Osipenko <digetx@xxxxxxxxx> writes:
>>>
>>>> 08.01.2022 21:13, Eric W. Biederman пишет:
>>>>> Dmitry Osipenko <digetx@xxxxxxxxx> writes:
>>>>>
>>>>>> 05.01.2022 22:58, Eric W. Biederman пишет:
>>>>>>>
>>>>>>> I have not yet been able to figure out how to run gst-pluggin-scanner in
>>>>>>> a way that triggers this yet. In truth I can't figure out how to
>>>>>>> run gst-pluggin-scanner in a useful way.
>>>>>>>
>>>>>>> I am going to set up some unit tests and see if I can reproduce your
>>>>>>> hang another way, but if you could give me some more information on what
>>>>>>> you are doing to trigger this I would appreciate it.
>>>>>>
>>>>>> Thanks, Eric. The distro is Arch Linux, but it's a development
>>>>>> environment where I'm running latest GStreamer from git master. I'll try
>>>>>> to figure out the reproduction steps and get back to you.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Until I can figure out why this is causing problems I have dropped the
>>>>> following two patches from my queue:
>>>>> signal: Make SIGKILL during coredumps an explicit special case
>>>>> signal: Drop signals received after a fatal signal has been processed
>>>>>
>>>>> I have replaced them with the following two patches that just do what
>>>>> is needed for the rest of the code in the series:
>>>>> signal: Have prepare_signal detect coredumps using
>>>>> signal: Make coredump handling explicit in complete_signal
>>>>>
>>>>> Perversely my failure to change the SIGKILL handling when coredumps are
>>>>> happening proves to me that I need to change the SIGKILL handling when
>>>>> coredumps are happening to make the code more maintainable.
>>>>
>>>> Eric, thank you again. I started to look at the reproduction steps and
>>>> haven't completed it yet. Turned out the problem affects only older
>>>> NVIDIA Tegra2 Cortex-A9 CPU that lacks support of ARM NEON instructions
>>>> set, hence the problem isn't visible on x86 and other CPUs out of the
>>>> box. I'll need to check whether the problem could be simulated on all
>>>> arches or maybe it's specific to VFP exception handling of ARM32.
>>>
>>> It sounds like the gstreamer plugins only fail on certain hardware on
>>> arm32, and things don't hang in coredumps unless the plugins fail.
>>> That does make things tricky to minimize.
>>>
>>> I have just verified that the known problematic code is not
>>> in linux-next for Jan 11 2022.
>>>
>>> If folks as they have time can double check linux-next and verify all is
>>> well I would appreciate it. I don't expect that there are problems but
>>> sometimes one problem hides another.
>>
>> Hello Eric,
>>
>> I reproduced the trouble on x86_64.
>>
>> Here are the reproduction steps, using ArchLinux and linux-next-20211224:
>>
>> ```
>> sudo pacman -S base-devel git mesa glu meson wget
>> git clone https://github.com/grate-driver/gstreamer.git
>> cd gstreamer
>> git checkout sigill
>> meson --prefix=/usr -Dgst-plugins-base:playback=enabled -Dgst-devtools:validate=disabled build
>> cd build
>> sudo ninja install
>> wget https://www.peach.themazzone.com/big_buck_bunny_720p_h264.mov
>> rm -r ~/.cache/gstreamer-1.0
>> gst-play-1.0 ./big_buck_bunny_720p_h264.mov
>> ```
>>
>> The SIGILL, thrown by [1], causes the hang. There is no hang using v5.16.1 kernel.
>>
>> [1] https://github.com/grate-driver/gstreamer/commit/006f9a2ee6dcf7b31c9b5413815d6054d82a3b2f
>
> Thank you.
>
> I will verify this works before I add my updated version to
> my signal-for-v5.18 branch.
>
> Have you by any chance tried a newer version of linux-next without
> commit fbc11520b58a ("signal: Make SIGKILL during coredumps an explicit
> special case") in it?
>
> If not I will double check that my pulling the commit out does not break
> in the case you have documented.

Recent linux-next works fine.