Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

From: Albert D. Cahalan (acahalan@cs.uml.edu)
Date: Wed Sep 13 2000 - 18:20:42 EST


Trond Myklebust writes:
> >>>>> " " == Albert D Cahalan <acahalan@cs.uml.edu> writes:

>> 20 bits * 3 timestamps == 60 bits 60 bits <= 8 bytes
>>
>> So you do need 8 bytes.
>
> Yes. Assuming that you want to implement the microseconds field on
> all 3 timestamps. For NFS I would only need that precision on mtime.
> Does anybody else see a need for it on ctime and/or atime?

If you don't implement all 3, you get an ugly asymetry that
messes up timestamp comparisons.

You'd get ctime<mtime when both were set to the same value.

>> I thought of this, but I don't think it is safe.
>>
>> write to file X us=1 write to file Y us=1 write to file Y us=2
>> write to file Y us=3 write to file X us=2
>>
>> Oops, "make" on some clients will think Y is newer than X.
>>
>> Using the wrong unit would be OK, as long as it is a real
>> sub-second time unit that doesn't overflow... but then you
>> might as well convert it to microseconds.
>
> Sure, but the alternative might be to reserve some bits as an
> 'operations counter' that is incremented every time an update is
> performed. That way you might have milisecond resolution
> (which is the sort of resolution that 'jiffies' & friends offers)
> + a few extra bits that would be there for make/NFS/... update
> consistency.

No way. You still get the same errors seen on clients that don't
know you are abusing the time stamps.

The ext2 inode has 6 obviously free bytes, 6 that are only used
on filesystems marked as Hurd-type, and 8 that seem to be claimed
by competing security and EA projects. So, being wasteful, it would
be possible to have picoseconds on all 4 time stamps. Doing mere
milliseconds on 3 timestamps would leave us 16 bytes for security.

This is what we have today:

struct ext2_inode {
        __u16 i_mode; /* mode (type & permission bitss) */
        __u16 i_uid; /* owner's UID */
        __u32 i_size; /* size (low 32 bits) */
        __u32 i_atime; /* access time */
        __u32 i_ctime; /* inode change time */
        __u32 i_mtime; /* modification time */
        __u32 i_dtime; /* deletion time */
        __u16 i_gid; /* group ID */
        __u16 i_links_count; /* link count */
        __u32 i_blocks; /* block count */
        __u32 i_flags; /* ext2 flags */
        union {
        __u32 i_translator; /* translator (Hurd only) */
        __u32 i_res_fork; /* resource fork (Linux privs) */
        } u1;
        __u32 i_block[15]; /* block pointers */
        __u32 i_version; /* inode generation (for NFS) */
        __u32 i_file_acl; /* file ACL */
        union {
        __u32 i_dir_acl; /* directory ACL */
        __u32 i_size_high; /* high 32 bits of 64-bit size */
        } u2;
        __u32 i_reserved1;
        __u16 i_reserved2;
        __u16 i_mode_high; /* extra mode bits (Hurd only) */
        __u16 i_uid_high; /* high 16 bits of 32-bit UID */
        __u16 i_gid_high; /* high 16 bits of 32-bit GID */
        __u32 i_author; /* author (Hurd only) */
};
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:22 EST