Re: ext4+quota+nfs issue

From: Jan Kara
Date: Mon Sep 14 2009 - 13:51:10 EST


Hello,

On Fri 11-09-09 16:33:48, Pavol Cvengros wrote:
> On 9/9/2009 9:02 PM, Pavol Cvengros wrote:
>> On 9/9/2009 7:45 PM, Justin Maggard wrote:
>>> On Wed, Sep 9, 2009 at 8:02 AM, Eric Sandeen<sandeen@xxxxxxxxxx> wrote:
>>>>> On Wed, 9 Sep 2009, Pavol Cvengros wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> can somebody who is aware of ext4 and quota have a look on this one?
>>>>>>
>>>> This was also just reported at:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=521914
>>>>
>>>> -Eric
>>>>
>>> I've seen exactly the same thing myself as well, but on local I/O.
>>> The only difference I was able to find between filesystems I saw this
>>> on, versus filesystems that I didn't see this on, was how it was
>>> created. The filesystems without this issue were made using
>>> mkfs.ext4, and the ones that _did_ have the issue were created with
>>> mkfs.ext3, and then mounted -t ext4. Pavol, can you check your
>>> filesystem features from "dumpe2fs -h [your_device]"?
>>>
>>> -Justin
>>> --
>>
>> here is the dump....
>>
>> host_stor0 ~ # dumpe2fs -h /dev/sdb1
>> dumpe2fs 1.41.9 (22-Aug-2009)
>> Filesystem volume name: <none>
>> Last mounted on: <not available>
>> Filesystem UUID: f8aef49b-1903-4e25-9a7b-a3f5557107fb
>> Filesystem magic number: 0xEF53
>> Filesystem revision #: 1 (dynamic)
>> Filesystem features: has_journal ext_attr resize_inode dir_index
>> filetype needs_recovery extent flex_bg sparse_super large_file huge_file
>> uninit_bg dir_nlink extra_isize
>> Filesystem flags: signed_directory_hash
>> Default mount options: (none)
>> Filesystem state: clean
>> Errors behavior: Continue
>> Filesystem OS type: Linux
>> Inode count: 305176576
>> Block count: 1220689911
>> Reserved block count: 12206899
>> Free blocks: 977820919
>> Free inodes: 250981592
>> First block: 0
>> Block size: 4096
>> Fragment size: 4096
>> Reserved GDT blocks: 732
>> Blocks per group: 32768
>> Fragments per group: 32768
>> Inodes per group: 8192
>> Inode blocks per group: 512
>> Flex block group size: 16
>> Filesystem created: Tue Jun 30 20:04:20 2009
>> Last mount time: Tue Aug 18 12:21:18 2009
>> Last write time: Tue Aug 18 12:21:18 2009
>> Mount count: 10
>> Maximum mount count: -1
>> Last checked: Tue Jun 30 20:04:20 2009
>> Check interval: 0 (<none>)
>> Lifetime writes: 73 GB
>> Reserved blocks uid: 0 (user root)
>> Reserved blocks gid: 0 (group root)
>> First inode: 11
>> Inode size: 256
>> Required extra isize: 28
>> Desired extra isize: 28
>> Journal inode: 8
>> Default directory hash: half_md4
>> Directory Hash Seed: 317c2fc4-9c86-42ca-a3c3-0d6c632dcb46
>> Journal backup: inode blocks
>> Journal size: 128M
I've found some time to look into this and I can see a few problems in
the code. Firstly, what may cause your problems:
vfs_dq_claim_blocks() is called in ext4_mb_mark_diskspace_used(). But
as far as I can understand the code, ext4_mb_normalize_request() can
increase the amount of space we really allocate and thus we try to
allocate more blocks than we have actually reserved in quota. Aneesh, is
that right?
Secondly, ext4_da_reserve_space() seems to have a bug that it can reserve
quota blocks multiple times if ext4_claim_free_blocks() fail and we retry
the allocation. We should release the quota reservation before restarting.
Actually, when we find out we cannot reserve quota space, we could force
some delayed allocated writes to disk (thus possibly release some quota
in case we have overestimated the amount of blocks needed). But that's
a different issue.
Thirdly, ext4_indirect_calc_metadata_amount() is wrong for sparse files.
The worst case is 3 metadata blocks per data block if we make the file
sufficiently sparse and there's no easy way around that...

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/