Re: [RFC][PATCH][EXPERIMENTAL] Make kernel threads nonfreezable bydefault

From: Nigel Cunningham
Date: Tue May 29 2007 - 08:59:57 EST


Hi.

On Tue, 2007-05-29 at 14:15 +0200, Rafael J. Wysocki wrote:
> Please have a look at the current version of the patch (appended).
>
> I have followed the Nigel's suggestion not to change the current behavior
> in this patch (I'll add a couple of patches removing the freezability from
> some kernel threads), with one exception: I couldn't figure out any reason
> to have try_to_freeze() called in net/sunrpc/svcsock.c:svc_recv() .

Thanks. IIRC, svcsock is related to the NFS server code.

> I've also added a piece of documentation, freezing-of-tasks.txt . Please
> see if it's not missing anything (I'd like it to be quite complete).

[...]

Mostly just grammar and the odd typo. On the whole, it's really well
written and perfectly readable - great job!

> Index: linux-2.6.22-rc3/Documentation/power/freezing-of-tasks.txt
> ===================================================================
> --- /dev/null
> +++ linux-2.6.22-rc3/Documentation/power/freezing-of-tasks.txt
> @@ -0,0 +1,160 @@
> +Freezing of tasks
> + (C) 2007 Rafael J. Wysocki <rjw@xxxxxxx>, GPL
> +
> +I. What is the freezing of tasks?
> +
> +The freezing of tasks is a mechanism by which user space processes and some
> +kernel threads are controlled during hibernation or system-wide suspend (on some
> +architectures).
> +
> +II. How it works?

How does it work?

> +
> +There are four per-task flags used for that, PF_NOFREEZE, PF_FROZEN, TIF_FREEZE
> +and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have
> +PF_NOFREEZE unset (all user space processes and some kernel threads) are
> +regarded as 'freezable' and treated in a special way before the system enters a
> +suspend state as well as before a hibernation image is created (in what follows
> +we only consider hibernation, but the description also applies to suspend).
> +
> +Namely, as the first step of the hibernation procedure the function
> +freeze_processes() (defined in kernel/power/process.c) is called. It executes
> +try_to_freeze_tasks() that sets TIF_FREEZE for all of the freezable tasks and
> +sends a fake signal to each of them. A task that receives such a signal and has
> +TIF_FREEZE set, should react to it by calling the refrigerator() function
> +(defined in kernel/power/process.c), which sets the task's PF_FROZEN flag,
> +changes its state to TASK_UNINTERRUPTIBLE and makes it loop until PF_FROZEN is
> +cleared for it. Then, we say that the task is 'frozen' and therefore the set of
> +functions handling this mechanism is called 'the freezer' (these functions are
> +defined in kernel/power/process.c and include/linux/freezer.h). User space
> +processes are generally frozen before kernel threads.
> +
> +It is not recommended to call refrigerator() directly. Instead, it is
> +recommended to use the try_to_freeze() function (defined in
> +include/linux/freezer.h), that checks the task's TIF_FREEZE flag and makes the
> +task enter refrigerator() if the flag is set.
> +
> +For user space processes try_to_freeze() is called automatically from the
> +signal-handling code, but the freezable kernel threads need to call it
> +explicitly in suitable places. The code to do this may look like the following:
> +
> + do {
> + hub_events();
> + wait_event_interruptible(khubd_wait,
> + !list_empty(&hub_event_list));
> + try_to_freeze();
> + } while (!signal_pending(current));
> +
> +(from drivers/usb/core/hub.c::hub_thread()).
> +
> +If a freezable kernel thread fails to call try_to_freeze() after the freezer has
> +set TIF_FREEZE for it, the freezing of tasks will fail and the entire
> +hibernation operation will be cancelled. For this reason, freezable kernel
> +threads must call try_to_freeze() somewhere.
> +
> +After the system memory state has been restored from a hibernation image and
> +devices have been reinitialized, the function thaw_processes() is called in
> +order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
> +have been frozen leave refrigerator() and continue running.
> +
> +III. Which kernel threads are freezable?
> +
> +Kernel threads are not freezable by default. However, a kernel thread may clear
> +PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
> +directly is strongly discouraged). From this point it is regarded as freezable
> +and must call try_to_freeze() in a suitable place.
> +
> +IV. Why do we do that?
> +
> +Generally speaking, there is a couple of reasons to use the freezing of tasks:
> +
> +1. The principal reason is to prevent filesystems from being damaged after
> +hibernation. Namely, for now we have no simple means of checkpointing

s/Namely, for now/At the moment/

No simple means or no means at all? Are you thinking of bdev freezing?

> +filesystems, so if there are any modifications made to filesystem data and/or
> +metadata on disks, we usually cannot bring them back to the state from before

If the above is changed, I'd remove 'usually' here.

> +the modifications. At the same time each hibernation image contains some
> +filesystem-related information that must be consistent with the state of the
> +on-disk data and metadata after the system memory state has been restored from
> +the image (otherwise the filesystems will be damaged in a nasty way, usually
> +making them almost impossible to repair). Therefore we freeze tasks that might

s/Therefore we/We therefore/

> +cause the on-disk filesystems' data and metadata to be modified after the
> +hibernation image has been created and before the system is finally powered off.
> +The majority of them is user space processes, but if any of kernel threads may

s/them is/these are/

s/of kernel/of the kernel/

> +cause something like this to happen, they have to be freezable.
> +
> +2. The second reason is to prevent user space processes and some kernel threads
> +from interfering with the suspending and resuming of devices. For example, a
> +user space process running on a second CPU while we are suspending devices may

I'd shift the "For example" to after "may", giving "...may, for example,
be troublesome..."

> +be troublesome and without the freezing of tasks we would need some safeguards
> +against race conditions that might occur in such a case.
> +
> +Although Linus Torvalds doesn't like the freezing of tasks, he said this in one
> +of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
> +
> +'> Why we freeze tasks at all or why we freeze kernel threads?
> +
> +In many ways, "at all".

I found these first two lines confusing - I though the "Why we
freeze..." was Linus, rather than a quotation he was responding to. I'd
suggest starting the quote at what follows this point... but then as I
read further, I can see the quote is necessary to make sense of the
second paragraph below. Perhaps the best way would to put a line before
the "Why we freeze..." indicating that you're being quoted there.

> +I _do_ realize the IO request queue issues, and that we cannot actually do
> +s2ram with some devices in the middle of a DMA. So we want to be able to
> +avoid *that*, there's no question about that. And I suspect that stopping
> +user threads and then waiting for a sync is practically one of the easier
> +ways to do so.
> +
> +So in practice, the "at all" may become a "why freeze kernel threads?" and
> +freezing user threads I don't find really objectionable.'

Oh, and double quotes should surround the whole quote, with single
quotes replacing the double quotes in the quotation. Hope all those
'quote's aren't confusing! :)

> +Still, there are kernel threads that may want to be freezable. For example, if
> +a kernel that belongs to a device driver accesses the device directly, it in
> +principle needs to know when the device is suspended, so that it doesn't try to
> +access it at that time. However, if the kernel thread is freezable, it will be
> +frozen before the driver's .suspend() callback is executed and it will be
> +thawed after the driver's .resume() callback has run, so it won't be accessing
> +the device while it's suspended.
> +
> +3. Another reason for freezing tasks is to prevent user space processes from
> +realizing that hibernation (or suspend) operation takes place. Ideally, user
> +space processes should not notice that such a system-wide operation has occured

s/occured/occurred/. That word gets me too.

> +and should continue running without any problems after the restore (or resume
> +from suspend). Unfortunately, in the most general case this is quite difficult
> +to achieve without the freezing of tasks. Consider, for example, a process
> +that depends on the number of CPUs being online while it's running. Since we

s/the number of/all/ (or secondary)

> +need to disable nonboot CPUs during the hibernation, if this process is not
> +frozen, it may notice that the number of CPUs has changed and may start to work
> +incorrectly because of that.
> +
> +V. Are there any problems related to the freezing of tasks?
> +
> +Yes, there are.
> +
> +First of all, the freezing of kernel threads may be tricky if they depend one
> +on another. For example, if kernel thread A waits for a completion (in the
> +TASK_UNINTERRUPTIBLE state) that needs to be done by freezable kernel thread B
> +and B is frozen in the meantime, then A will be blocked until B is thawed, which
> +may be undesirable. That's why kernel threads are not freezable by default.
> +
> +Second, there are the following two problems related to the freezing of user
> +space processes:
> +1. Putting processes into an uninterruptible sleep stuffs up the load average.

s/stuffs up/distorts/ ('Stuffs up' is accurate as a colloquialism, but
I'm suggesting the change because the language in the remainder of the
file is more formal - this seems out of place).

> +2. Now that we have FUSE, plus the framework for doing device drivers in
> +userspace, it gets even more complicated because some userspace processes are
> +now doing the sorts of things that kernel threads do
> +(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).

Death to them all, I say! :)

> +The problem 1. seems to be fixable, although it hasn't been fixed so far. The
> +other one is more serious, but it seems that we can work around it by using
> +hibernation (and suspend) notifiers (in that case, though, we won't be able to
> +avoid the realization by the user space processes that the hibernation is taking
> +place).
> +
> +There also are problems that the freezing of tasks tends to expose, although

s/also are/are also/

> +they are not directly related to it. For example, if request_firmware() is
> +called from a device driver's .resume() routine, it will timeout and eventually
> +fail, because the user land process that should respond to the request is frozen
> +at this point. So, seemingly, the failure is due to the freezing of tasks.
> +Suppose, however, that the firmware file is located on a filesystem accessible
> +only through the device that needs the firmware. In that case, the system won't
> +be able to work normally after the restore regardless of whether or not the
> +freezing of tasks is used. Consequently, the problem is not really related to
> +the freezing of tasks, since it generally exists regardless. [The solution to
> +this particular problem is to keep the firmware in memory after it's loaded for
> +the first time and upload if from memory to the device whenever necessary.]

I understand the logic and agree with that you're trying to say in this
last example, but think the example is faulty. If the firmware is on a
filesystem accessible only through the device that needs the firmware,
then you wouldn't be able to bring it up in the first place.

Regards,

Nigel

Attachment: signature.asc
Description: This is a digitally signed message part