Re: stat64 for over 2TB file returned invalid st_blocks

From: Takashi Sato
Date: Sat Dec 10 2005 - 06:22:22 EST


Hi,

I prefer sector_t for i_blocks rather than newly defined blkcnt_t.
The reasons are:

- Both i_blocks and common sector_t are for on-disk 512-byte unit.
In this point of view, they have the same character.

One is a count of the number of blocks used by a file, and exists only
in order to help filesystems cache this value. The other is a handle to
a block. How is that the same?

Both i_blocks and sector_t handle on-disk blocks in a certain unit.

But their relation seems to be similar to the relation between off_t
and size_t of POSIX. So adding blkcnt_t makes sense too. I'll think
of this point further.

By the way, I think there is a kind of confusions on sector_t. Its
unit is 512 bytes on block/ and driver/ tree, but it is filesystem
block on fs/ tree. Does anyone think the latter should be fsblock_t
or something?

- If we created the type blkcnt_t newly, the patch would have to
touch a lot of files as follows, like sector_t does.
block/Kconfig, asm-i386/types.h, asm-x86_64/types.h,
asm-ppc/types.h, asm-s390/types.h, asm-sh/types.h,
asm-h8300/types.h, asm-mips/types.h
It will be simple if we use sector_t for i_blocks.

That is not a particularly good reason.

Also, I cannot imagine the situation that > 2TB files are used over
network with CONFIG_LBD disabled kernel. Is there such a thing
realistically?

CONFIG_LBD is enabled not only on HPC but on all major distribution for
server and desktop. The only exception is embedded system, isn't it?

Apart from this and the kstat wart, there is no reason to set CONFIG_LBD
for a networked filesystem. Why would you want to buy a > 2TB local disk
on an HPC cluster node if you already have a server?

I suppose we can make NFS use a private field instead, and just set
i_blocks to 0, but that's unnecessarily wasteful too.

I think your suggestion is effective only for the problem of stat64,
but other problems would be left on code which handles inode.i_blocks.
For example, ioctl with FIOQSIZE command returns the size of file's
data calculated from inode.i_blocks. If inode.i_blocks is 0, ioctl
with FIOQSIZE always returns 0 even if the file data exists.

-- Takashi Sato
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/