Re: Back to the future.

From: David Lang
Date: Fri Apr 27 2007 - 21:27:27 EST


On Sat, 28 Apr 2007, Rafael J. Wysocki wrote:

On Saturday, 28 April 2007 03:03, Kyle Moffett wrote:
On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote:
Hi.

On Fri, 2007-04-27 at 14:44 -0700, Linus Torvalds wrote:
It makes it harder to debug (wouldn't it be *nice* to just ssh in,
and do
gdb -p <snapshotter>

Make the machine being suspended a VM and you can already do that.

when something goes wrong?) but we also *depend* on user space for
various things (the same way we depend on kernel threads, and why
it has been such a total disaster to try to freeze the kernel
threads too!). For example, if you want to do graphical stuff,
just using X would be quite nice, wouldn't it?

But in doing so you make the contents of the disk inconsistent with
the state you've just snapshotted, leading to filesystem
corruption. Even if you modify filesystems to do checkpointing
(which is what we're really talking about), you still also have the
problem that your snapshot has to be stored somewhere before you
write it to disk, so you also have to either [snip]

Actually, it's a lot simpler than that. We can just combine the
device-mapper snapshot with a VM+kernel snapshot system call and be
almost done:

sys_snapshot(dev_t snapblockdev, int __user *snapshotfd);

When sys_snapshot is run, the kernel does:

1) Sequentially freeze mounted filesystems using blockdev freezing.
If it's an fs that doesn't support freezing then either fail or force-
remount-ro that fs and downgrade all its filedescriptors to RO.
Doesn't need extra locking since process which try to do IO either
succeed before the freeze call returns for that blockdev or sleep on
the unfreeze of that blockdev. Filesystems are synchronized and made
clean.
2) Iterate over the userspace process list, freezing each process
and remapping all of its pages copy-on-write. Any device-specific
pages need to have state saved by that device.

Why do you want to do 2) after 1) and not vice versa?

it doesn't really need to matter. if you care, just arrange to not schedule user processes while you are doing both steps.

3) All processes (except kernel threads) are now frozen.
4) Kernel should save internal state corresponding to current
userspace state. The kernel also swaps out excess pages to free up
enough RAM and prepares the snapshot file-descriptor with copies of
kernel memory and the original (pre-COW) mapped userspace pages.
5) Kernel substitutes filesystems for either a device-mapper
snapshot with snapblockdev as backing storage or union with tmpfs and
remounts the underlying filesystems as read-only.
6) Kernel unfreezes all userspace processes and returns the snapshot
FD to userspace (where it can be read from).

Okay, but how do we do the error recovery if, for example, the image cannot
be saved?

give the user an error message telling him this, wait for confirmation, and then jump directly to the restore step. revert everything to the snapshot image(s), restart it.

David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/