Re: Kernels v4.9+ cause short reads of block devices

From: Andreas Dilger
Date: Wed Aug 23 2017 - 17:01:53 EST


On Aug 23, 2017, at 2:13 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Aug 23, 2017 at 12:53 PM, Doug Nazar <nazard@xxxxxxxx> wrote:
>>
>> It's compiling now, but I think it's already set to MAX_LFS_FILESIZE.
>>
>> [ 169.095127] ppos=80180006000, s_maxbytes=7ffffffffff, magic=0x62646576,
>> type=bdev
>
> Oh, right you are - I'm much too used to 64-bit, where
> MAX_LFS_FILESIZE is basically infinite, and was jusr assuming that it
> was something like the UFS bug we had not that long ago that was due
> to the 32-bit limit.
>
> But yes, on 32-bit, we are limited by the 32-bit index into the page
> cache, and we limit the index to 31 bits too, so we have (PAGE_SIZE <<
> 31) -1, which is that 7ffffffffff.
>
> And that also explains why people haven't seen it. You do need
>
> (a) 32-bit environment
>
> (b) a disk larger than that 8TB in size
>
> The *hard* limit for the page cache on a 32-bit environment should
> actually be (PAGE_SIZE << 32)-PAGE_SIZE (that final PAGE_SIZE
> subtraction is to make sure we don't generate that page cache with
> index -1), so having a disk that is 16TB or larger is not going to
> work, but your disk is right in that 8TB-16TB hole that used to work
> and was broken by that check.
>
> Anyway, that makes me feel better. I should have looked at your disk
> size more, now I at least understand why nobody noticed before.
>
> So just throw away my patch. That's wrong, and garbage.
>
> The *right* patch is likely to just this instead:
>
> -#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
> +#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE <<
> BITS_PER_LONG)-PAGE_SIZE)
>
> which should make MAX_LFS_FILESIZE be 0xffffffff000 and you disk size
> should be ok.

Doug,
I noticed while checking for other implications of changing MAX_LFS_FILESIZE
that fs/jfs/super.c is also working around this limit. If you are going
to submit a patch for this, it also makes sense to fix jfs_fill_super() to
use MAX_LFS_FILESIZE instead of JFS rolling its own, something like:

/* logical blocks are represented by 40 bits in pxd_t, etc.
* and page cache is indexed by long. */
sb->s_maxbytes = min((u64)sb->s_blocksize) << 40,
MAX_LFS_FILESIZE);

It also looks like ocfs2_max_file_offset() is trying to avoid overflowing
the old 31-bit limit, and isn't using MAX_LFS_FILESIZE directly, so it will
now be wrong. It looks like it could use "bitshift = 32; trim = bytes;",
but Joel or Mark should confirm.

Finally, there is a check in fs/super.c::mount_fs() that is verifying
s_maxbytes is not set too large, but this has been present since 2.6.32
and should probably be removed at this point, or changed to a BUG_ON()
(see commit 42cb56ae2ab for details).

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP