Re: Back to the future.

From: Linus Torvalds
Date: Thu Apr 26 2007 - 12:57:35 EST




On Thu, 26 Apr 2007, Nigel Cunningham wrote:
>
> * Doing things in the right order? (Prepare the image, then do the
> atomic copy, then save).

I'd actually like to discuss this a bit..

I'm obviously not a huge fan of the whole user/kernel level split and
interfaces, but I actually do think that there is *one* split that makes
sense:

- generate the (whole) snapshot image entirely inside the kernel

- do nothing else (ie no IO at all), and just export it as a single image
to user space (literally just mapping the pages into user space).
*one* interface. None of the "pretty UI update" crap. Just a single
system call:

void *snapshot_system(u32 *size);

which will map in the snapshot, return the mapped address and the size
(and if you want to support snapshots > 4GB, be my guest, but I suspect
you're actually *better* off just admitting that if you cannot shrink
the snapshot to less than 32 bits, it's not worth doing)

User space gets a fully running system, with that one process having that
one image mapped into its address space. It can then compress/write/do
whatever to that snapshot.

You need one other system call, of course, which is

int resume_snapshot(void *snapshot, u32 size);

and for testing, you should be able to basically do

u32 size;
void *buffer = snapshot_system(&size);
if (buffer != MAP_FAILED)
resume_snapshot(buffer, size);

and it should obviously work.

And btw, the device model changes are a big part of this. Because I don't
think it's even remotely debuggable with the full suspend/resume of the
devices being part of generating the image! That freeze/snapshot/unfreeze
sequence is likely a lot more debuggable, if only because freeze/unfreeze
is actually a no-op for most devices, and snapshotting is trivial too.

Once you have that snapshot image in user space you can do anything you
want. And again: you'd hav a fully working system: not any degradation
*at*all*. If you're in X, then X will continue running etc even after the
snapshotting, although obviously the snapshotting will have tried to page
a lot of stuff out in order to make the snapshot smaller, so you'll likely
be crawling.

> * Mulithreaded I/O (might as well use multiple cores to compress the
> image, now that we're hotplugging later).
> * Support for > 1 swap device.
> * Support for ordinary files.
> * Full image option.
> * Modular design?

I'd really suggest _just_ the "full image". Nothing else is probably ever
worth supporting. Your "snapshot to disk" wouldn't be _quite_ as simple as
"echo disk > /sys/power/state", but it should not necessarily be much
worse than

snapshot_kernel | gzip -9 > /dev/snapshot

either (and resuming from the snapshot would just be the reverse)!

And if you want to send the snapshot over a TCP connection to another
host, be my guest. With pretty images while it's transferring. Whatever.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/