Re: hung_task checking and sys_sync

From: Muthu Kumar
Date: Tue Jun 12 2012 - 21:03:17 EST


On Tue, Jun 12, 2012 at 3:57 PM, Daniel Walker <dwalker@xxxxxxxxxx> wrote:
> On Tue, Jun 12, 2012 at 03:45:20PM -0700, Mandeep Baines wrote:
>> On Tue, Jun 12, 2012 at 3:34 PM, Daniel Walker <dwalker@xxxxxxxxxx> wrote:
>> > On Tue, Jun 12, 2012 at 03:29:12PM -0700, Mandeep Singh Baines wrote:
>> >>
>> >> But the time is not unbounded. You could mask the hung_task_detector for
>> >> this case but then you lose the ability to catch bugs in this code path.
>> >>
>> >> The timeout is configurable via /proc/sys/kernel/hung_task_timeout_secs.
>> >> Can you bump up the value at boot via sysctl.conf?
>> >
>> > Maybe, but I'm wondering if these types should just be stopped because Andrew
>> > had complained about them already.
>> >
>>
>> Fair enough. Actually, internally I had a patch where we'd use a task
>> flag to disable and enable the hang check but the approach in the
>> patch you pointed me to seems better.
>
> I'm not really in love with it actually.. It's not ifdef'd for one, but
> it's also changing potentially good kernel behavior to avoid warnings.
>
I totally agree with you (but, not the ifdef part :). The mentioned
change actually was masking a potential problem - see
https://lkml.org/lkml/2012/6/6/483. If not for that change, we would
have got hung task message for the case where blk_execute_req() would
have stuck forever without the completion being called.



>> >> > Has there been any commit that disable these messages bdi_sched_wait?
>> >> >
>> >>
>> >> No. There is no mechanism to disable hung_task for a specific code path.
>> >> We do skip processes if PF_PROZEN or PF_FROZEN_SKIP is set but that is
>> >> really a different situation where the wait is unbounded.
>> >
>> > There is presidence for this type of change,
>> >
>> > Author: Mark Lord <kernel@xxxxxxxxxxxx>
>> > Date:   Fri Sep 24 09:51:13 2010 -0400
>> >
>> >    block: Prevent hang_check firing during long I/O
>> >
>> >    During long I/O operations, the hang_check timer may fire,
>> >    trigger stack dumps that unnecessarily alarm the user.
>> >
>> >    Eg.  hdparm --security-erase NULL /dev/sdb  ## can take *hours* to complete
>> >
>> >    So, if hang_check is armed, we should wake up periodically
>> >    to prevent it from triggering.  This patch uses a wake-up interval
>> >    equal to half the hang_check timer period, which keeps overhead low enough.
>> >
>> >    Signed-off-by: Mark Lord <mlord@xxxxxxxxx>
>> >    Signed-off-by: Jens Axboe <jaxboe@xxxxxxxxxxxx>
>> >
>>
>> Interesting. I wasn't aware of this patch. Maybe we could abstract
>> this approach via wait_for_completion_no_hang_check().
>
> Could be .. You could put a stack structure into a list of tasks that
> should be ignored prior to the task sleeping. Then when the thread wakes
> the stack structure could be removed. Then that list get checked
> during the hung task checking.
>
> Daniel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/