Re: [PATCH] hung_task: Skip hung task detection during core dump operations

From: Lance Yang
Date: Thu Aug 14 2025 - 00:32:58 EST




On 2025/8/14 11:31, Nanji Parmar (he/him) wrote:
Hi Lance, Andrew,

Thanks for looking into this.
After checking further, we found that the following patch fixed that issue. Thank you once again.

Ah, I see. That's why I couldn't reproduce it on 6.16 kernel — the
fix was already there ;)

Thanks for digging this up!
Lance



commit b8e753128ed074fcb48e9ceded940752f6b1c19f
Author: Paul E. McKenney <paulmck@xxxxxxxxxx <mailto:paulmck@xxxxxxxxxx>>
Date:   Wed Jul 24 17:51:52 2024

    exit: Sleep at TASK_IDLE when waiting for application core dump

    Currently, the coredump_task_exit() function sets the task state
    to TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well.
    But a combination of large memory and slow (and/or highly contended)
    mass storage can cause application core dumps to take more than
    two minutes, which can cause check_hung_task(), which is invoked by
    check_hung_uninterruptible_tasks(), to produce task-blocked splats.
    There does not seem to be any reasonable benefit to getting these splats.

    Furthermore, as Oleg Nesterov points out, TASK_UNINTERRUPTIBLE could
    be misleading because the task sleeping in coredump_task_exit() really
    is killable, albeit indirectly.  See the check of signal->core_state
    in prepare_signal() and the check of fatal_signal_pending()
    in dump_interrupted(), which bypass the normal unkillability of
    TASK_UNINTERRUPTIBLE, resulting in coredump_finish() invoking
    wake_up_process() on any threads sleeping in coredump_task_exit().

    Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.

    Reported-by: Anhad Jai Singh <ffledgling@xxxxxxxx <mailto:ffledgling@xxxxxxxx>>
    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx <mailto:paulmck@xxxxxxxxxx>>
    Acked-by: Oleg Nesterov <oleg@xxxxxxxxxx <mailto:oleg@xxxxxxxxxx>>
    Cc: Jens Axboe <axboe@xxxxxxxxx <mailto:axboe@xxxxxxxxx>>
    Cc: Christian Brauner <brauner@xxxxxxxxxx <mailto:brauner@xxxxxxxxxx>>
    Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx <mailto:akpm@linux- foundation.org>>
    Cc: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx <mailto:willy@xxxxxxxxxxxxx>>
    Cc: Chris Mason <clm@xxxxxx <mailto:clm@xxxxxx>>
    Cc: Rik van Riel <riel@xxxxxxxxxxx <mailto:riel@xxxxxxxxxxx>>

diff --git a/kernel/exit.c b/kernel/exit.c
index 7430852a8571..0d62a53605df 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -428,7 +428,7 @@ static void coredump_task_exit(struct task_struct *tsk)
                        complete(&core_state->startup);

                for (;;) {
-                       set_current_state(TASK_UNINTERRUPTIBLE| TASK_FREEZABLE);
+                       set_current_state(TASK_IDLE|TASK_FREEZABLE);
                        if (!self.task) /* see coredump_finish() */
                                break;
                        schedule();

Thanks,
Nanji

On Wed, Aug 13, 2025 at 8:12 PM Lance Yang <lance.yang@xxxxxxxxx <mailto:lance.yang@xxxxxxxxx>> wrote:

Hi Nanji,

Thanks for your patch!

On 2025/8/14 06:01, Andrew Morton wrote:
> On Wed, 13 Aug 2025 11:30:36 -0700 "Nanji Parmar (he/him)"
<nparmar@xxxxxxxxxxxxxxx <mailto:nparmar@xxxxxxxxxxxxxxx>> wrote:
>
>> Tasks involved in core dump operations can legitimately block for
>> extended periods, especially for large memory processes. The hung
>> task detector should skip tasks with PF_DUMPCORE (main dumping
>> thread) or PF_POSTCOREDUMP (other threads in the group) flags to
>> avoid false positive warnings.
>>
>> This prevents incorrect hung task reports during legitimate core
>> dump generation that can take xx minutes for large processes.
>
> It isn't pleasing to be putting coredump special cases into the
core of
> the hung-task detector.  Perhaps the hung task detector should get an

Yeah, adding a special case for coredumps is not a good design ;)

> equivalent to touch_softlockup_watchdog().  I'm surprised it doesn't
> already have such a thing.  Maybe it does and I've forgotten
where it is.
>
> Please provide a full description of the problem, mainly the relevant
> dmesg output.  Please always provide this full description when
> addressing kernel issues, thanks.

Interestingly, I wasn't able to reproduce the hung task warning on my
machine with a SSD, even when generating a 100 GiB coredump. The process
switches between R and D states so fast that it never hits the timeout,
even with hung_task_timeout_secs set as low as 5s ;)

So it seems this isn't a general problem for all coredumps. It look like
it only happens on systems with slow I/O, which can cause a process to
stay in a D-state for a long time.

Anyway, any task *actually* blocked on I/O for that long should be
flagged;
that is the hung task detector's job, IMHO.

Thanks,
Lance