Re: [RFC v10][PATCH 08/13] Dump open file descriptors

From: Oren Laadan
Date: Mon Dec 01 2008 - 16:22:14 EST

Dave Hansen wrote:
> On Mon, 2008-12-01 at 15:23 -0500, Oren Laadan wrote:
>> Verifying that the size doesn't change does not ensure that the table's
>> contents remained the same, so we can still end up with obsolete data.
> With the realloc() scheme, we have virtually no guarantees about how the
> fdtable that we read relates to the source. All that we know is that
> the n'th fd was at this value at *some* time.
> Using the scheme that I just suggested (and you evidently originally
> used) at least guarantees that we have an atomic copy of the fdtable.

It doesn't matter if the fdtable changes while we scan the fd's or
after we're done scanning the fd's, does it ? the end result is the
same - inconsistency.

> Why is this done in two steps? It first grabs a list of fd numbers
> which needs to be validated, then goes back and turns those into 'struct
> file's which it saves off. Is there a problem with doing that
> fd->'struct file' conversion under the files->file_lock?

Again, say we get all the struct file for all those fd's. Now we have to
drop the lock, and start processing the struct files. So what if at this
point the fdtable changes ? we still end up with potentially inconsistent

We can do it atomically or non-atomically, I couldn't care less.

My argument is the following:
1) Ideally, the state should not change while we are checkpointing
2) If the state changes, then the behavior is undefined (but the kernel
should not crash, of course).
3) It should be the userspace responsibility to ensure that (1) holds
(for instance, that all tasks sharing these resources are frozen
during the checkpoint)

#2 will be an uncommon event - a programmer/user/administrator mistake.
Kernel may help enforce this rule, but catching all such cases will be
expensive as it means looping through all tasks for each new resource.

As a first step, we can ensure that all threads of a process are frozen
at checkpoint time (and prevent unfreeze until checkpoint is over). But
that will not cover a clone() with CLONE_FS. And this will be the same
for other namespaces later.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at