Hi Lance, Andrew,
Thanks for looking into this.
After checking further, we found that the following patch fixed that issue. Thank you once again.
commit b8e753128ed074fcb48e9ceded940752f6b1c19f
Author: Paul E. McKenney <paulmck@xxxxxxxxxx <mailto:paulmck@xxxxxxxxxx>>
Date: Wed Jul 24 17:51:52 2024
exit: Sleep at TASK_IDLE when waiting for application core dump
Currently, the coredump_task_exit() function sets the task state
to TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well.
But a combination of large memory and slow (and/or highly contended)
mass storage can cause application core dumps to take more than
two minutes, which can cause check_hung_task(), which is invoked by
check_hung_uninterruptible_tasks(), to produce task-blocked splats.
There does not seem to be any reasonable benefit to getting these splats.
Furthermore, as Oleg Nesterov points out, TASK_UNINTERRUPTIBLE could
be misleading because the task sleeping in coredump_task_exit() really
is killable, albeit indirectly. See the check of signal->core_state
in prepare_signal() and the check of fatal_signal_pending()
in dump_interrupted(), which bypass the normal unkillability of
TASK_UNINTERRUPTIBLE, resulting in coredump_finish() invoking
wake_up_process() on any threads sleeping in coredump_task_exit().
Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
Reported-by: Anhad Jai Singh <ffledgling@xxxxxxxx <mailto:ffledgling@xxxxxxxx>>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx <mailto:paulmck@xxxxxxxxxx>>
Acked-by: Oleg Nesterov <oleg@xxxxxxxxxx <mailto:oleg@xxxxxxxxxx>>
Cc: Jens Axboe <axboe@xxxxxxxxx <mailto:axboe@xxxxxxxxx>>
Cc: Christian Brauner <brauner@xxxxxxxxxx <mailto:brauner@xxxxxxxxxx>>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx <mailto:akpm@linux- foundation.org>>
Cc: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx <mailto:willy@xxxxxxxxxxxxx>>
Cc: Chris Mason <clm@xxxxxx <mailto:clm@xxxxxx>>
Cc: Rik van Riel <riel@xxxxxxxxxxx <mailto:riel@xxxxxxxxxxx>>
diff --git a/kernel/exit.c b/kernel/exit.c
index 7430852a8571..0d62a53605df 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -428,7 +428,7 @@ static void coredump_task_exit(struct task_struct *tsk)
complete(&core_state->startup);
for (;;) {
- set_current_state(TASK_UNINTERRUPTIBLE| TASK_FREEZABLE);
+ set_current_state(TASK_IDLE|TASK_FREEZABLE);
if (!self.task) /* see coredump_finish() */
break;
schedule();
Thanks,
Nanji
On Wed, Aug 13, 2025 at 8:12 PM Lance Yang <lance.yang@xxxxxxxxx <mailto:lance.yang@xxxxxxxxx>> wrote:
Hi Nanji,
Thanks for your patch!
On 2025/8/14 06:01, Andrew Morton wrote:
> On Wed, 13 Aug 2025 11:30:36 -0700 "Nanji Parmar (he/him)"
<nparmar@xxxxxxxxxxxxxxx <mailto:nparmar@xxxxxxxxxxxxxxx>> wrote:
>
>> Tasks involved in core dump operations can legitimately block for
>> extended periods, especially for large memory processes. The hung
>> task detector should skip tasks with PF_DUMPCORE (main dumping
>> thread) or PF_POSTCOREDUMP (other threads in the group) flags to
>> avoid false positive warnings.
>>
>> This prevents incorrect hung task reports during legitimate core
>> dump generation that can take xx minutes for large processes.
>
> It isn't pleasing to be putting coredump special cases into the
core of
> the hung-task detector. Perhaps the hung task detector should get an
Yeah, adding a special case for coredumps is not a good design ;)
> equivalent to touch_softlockup_watchdog(). I'm surprised it doesn't
> already have such a thing. Maybe it does and I've forgotten
where it is.
>
> Please provide a full description of the problem, mainly the relevant
> dmesg output. Please always provide this full description when
> addressing kernel issues, thanks.
Interestingly, I wasn't able to reproduce the hung task warning on my
machine with a SSD, even when generating a 100 GiB coredump. The process
switches between R and D states so fast that it never hits the timeout,
even with hung_task_timeout_secs set as low as 5s ;)
So it seems this isn't a general problem for all coredumps. It look like
it only happens on systems with slow I/O, which can cause a process to
stay in a D-state for a long time.
Anyway, any task *actually* blocked on I/O for that long should be
flagged;
that is the hung task detector's job, IMHO.
Thanks,
Lance