Re: [PATCH] hung_task: add warning counter to blocked task report

From: Lance Yang
Date: Mon Jul 21 2025 - 02:19:36 EST




On 2025/7/21 13:45, Ye Liu wrote:


On 2025/7/21 12:56, Lance Yang wrote:
Hi Ye,

Thanks for your patch!

On 2025/7/21 11:17, Ye Liu wrote:
From: Ye Liu <liuye@xxxxxxxxxx>

Add a warning counter to each hung task message to make it easier
to analyze and locate issues in the logs.

Signed-off-by: Ye Liu <liuye@xxxxxxxxxx>
---
  kernel/hung_task.c | 6 ++++--
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 8708a1205f82..9e5f86148d47 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -58,6 +58,7 @@ EXPORT_SYMBOL_GPL(sysctl_hung_task_timeout_secs);
  static unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
    static int __read_mostly sysctl_hung_task_warnings = 10;
+static int hung_task_warning_count;
    static int __read_mostly did_panic;
  static bool hung_task_show_lock;
@@ -232,8 +233,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
      if (sysctl_hung_task_warnings || hung_task_call_panic) {
          if (sysctl_hung_task_warnings > 0)
              sysctl_hung_task_warnings--;
-        pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
-               t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
+        pr_err("INFO: task %s:%d blocked for more than %ld seconds. [Warning #%d]\n",
+               t->comm, t->pid, (jiffies - t->last_switch_time) / HZ,
+               ++hung_task_warning_count);
          pr_err("      %s %s %.*s\n",
              print_tainted(), init_utsname()->release,
              (int)strcspn(init_utsname()->version, " "),

A quick thought on this: we already have the hung_task_detect_count
counter, which tracks the total number of hung tasks detected since
boot ;)

While this patch adds a counter inline with the warning message, the
existing counter already provides a way to know how many hung task
events have occurred.

Could you elaborate on the specific benefit of printing this count
directly in the log, compared to checking the global hung_task_detect_count?

Also, if the goal is to give each warning a unique sequence number,
I think the dmesg timestamp prefix serves the same purpose ;)

Thanks,
Lance

Sorry for not noticing sysctl_hung_task_detect_count.
I just thought adding it directly to the warning message would make the
log easier to read and more intuitive than relying on timestamps.

If accepted, I will send V2, like this:

Let's step back and considet the practical use case. when we are
troubleshooting hung task issues in a production log, what information
do we actually use?

Typically, we look for:
1) The timestamp, to correlate with other system events
2) The task name and PID (%s:%d)
3) The kernel stack trace that follows, to see where it's stuck

So, my question is: in what specific troubleshooting scenario would
knowing the sequence number, like [#N], provide actionable information
that the above data points do not?

Unless there's a compelling use case I'm missing, I'd prefer to keep
the code as it is ;)
Thanks,
Lance


diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 8708a1205f82..231afdb68bb2 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -232,8 +232,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
if (sysctl_hung_task_warnings || hung_task_call_panic) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
- pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
- t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
+ pr_err("INFO: task %s:%d blocked for more than %ld seconds. [#%ld]\n",
+ t->comm, t->pid, (jiffies - t->last_switch_time) / HZ,
+ sysctl_hung_task_detect_count);
pr_err(" %s %s %.*s\n",
print_tainted(), init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),