Possible ext2/3/4 filesysystem iov_length integer overflow and strangebehavior on large writes

From: halfdog
Date: Fri Jun 17 2011 - 12:43:22 EST

Hash: SHA1

If I understand it correctly, there might be multiple iov_length
interger overflows on 32bit arch in ext2, ext3, ext4, e.g.


static ssize_t
ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos)
* If we have encountered a bitmap-format file, the size limit
* is smaller than s_maxbytes, which is for extent-mapped files.
if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
size_t length = iov_length(iov, nr_segs); << length
might be any value with more than 4GB data

if ((pos > sbi->s_bitmap_maxbytes ||
(pos == sbi->s_bitmap_maxbytes && length > 0)))
return -EFBIG;

if (pos + length > sbi->s_bitmap_maxbytes) {
nr_segs = iov_shorten((struct iovec *)iov, nr_segs,
sbi->s_bitmap_maxbytes - pos);

Can someone confirm or refute that? I wrote a small test program, but
failed to inflict damage on the kernel or filesystem, so I might have
missed something. From source grep, also other filesystems might have
the same problem.

Apart from that, large iov writes seem to be uninteruptible. Sending a
kill signal to the process in writev terminates it after finishing the

./LargeWritevTest --File x --IovecNum 257 --BufferSize 16777216
- --LastSize 10
pkill -KILL LargeWritevTest

[24306.588390] INFO: task LargeWritevTest:1390 blocked for more than 120
[24306.589984] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[24306.590512] WritevTest D 00000086 0 1390 1380 0x00000004
[24306.590571] c8a91db0 00000082 c1040b73 00000086 00000000 c86a1940
c86a1bcc c183a8c0
[24309.657798] 8dcb7199 000014fc c86a1bc8 c183a8c0 c183a8c0 cac068c0
c86a1940 c87e0ca0
[24309.657871] cac03640 c8605ae8 000581ca 00000380 00000000 00000001
c8a91d90 c103351c
[24309.657908] Call Trace:
[24309.658226] [<c1040b73>] ? entity_tick+0x73/0x130
[24309.658284] [<c103351c>] ? kmap_atomic_prot+0x4c/0x100
[24309.658331] [<c10e7dc0>] ? prep_new_page+0x110/0x1a0
[24309.658439] [<c15087e6>] __mutex_lock_slowpath+0xd6/0x140
[24309.658526] [<c1508355>] mutex_lock+0x25/0x40
[24309.658547] [<c10e3c1b>] generic_file_aio_write+0x4b/0xd0
[24309.658587] [<c11a9a84>] ext4_file_write+0x54/0x2a0
[24309.658608] [<c10e8809>] ? __alloc_pages_nodemask+0xf9/0x710
[24309.658627] [<c10e8809>] ? __alloc_pages_nodemask+0xf9/0x710
[24309.658805] [<c11a9a30>] ? ext4_file_write+0x0/0x2a0
[24309.660607] [<c1127676>] do_sync_readv_writev+0xa6/0xe0

Since writev would allow 1024 segments a 1GB, one might be able to
consume 1TB (all) disk space on a machine and the process cannot be
stopped. On 32 bit architecture, the write stops after 2GB, but I'm not
sure why. Would terrabyte writes be possible on 64-bit systems?

On 32-bit, forking and calling write on different files has to be used
instead. Since processes cannot be terminated, reboot does not unmount
cleanly, so that might increase likelihood of disk corruption.

For testing I used
on an ext4 filesystem, but failed to understand the various outcomes.
Especially un-comprehensible was the oscillation between disk-full and
disk-free when writing with O_DIRECT to a disk with not enough free
space. The behavior change also unexpected, when aligning the memory
buffers to page-size or ext blocksize, or doing unaligned IO.

7G free:
./LargeWritevTest --File x --IovecNum 256 --BufferSize 16777216
./LargeWritevTest --File x --IovecNum 257 --BufferSize 16777216
- --LastSize 10tou
./LargeWritevTest --File y --IovecNum 512 --BufferSize 16777216
- --LastSize 16777215
Write result 2147479552 (is 2^31-4096)

./LargeWritevTest --File x --IovecNum 257 --BufferSize 16777216
- --LastSize 10 --Align 65536
Write result 16740352 (fast)

3.9G free:
./LargeWritevTest --File x --IovecNum 257 --BufferSize 16777216
- --LastSize 10 --Align 65536 --Direct
./LargeWritevTest --File x --IovecNum 256 --BufferSize 16777216 --Align
65536 --Direct
Write result -14 (immediate)

./LargeWritevTest --File x --IovecNum 257 --BufferSize 16777216
- --LastSize 10 --Direct
./LargeWritevTest --File x --IovecNum 256 --BufferSize 16777216 --Direct
Write result -22 (immediate)

Less than 2GB:
./LargeWritevTest --File z --IovecNum 257 --BufferSize 16777216
- --LastSize 10 --Align 4096 --Direct
Oscillates between disk empty/full?

- --
PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee
Version: GnuPG v1.4.6 (GNU/Linux)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/