Re: ftruncate-mmap: pages are lost after writing to mmaped file.

From: Ying Han
Date: Thu Apr 02 2009 - 20:13:29 EST


On Thu, Apr 2, 2009 at 4:24 AM, Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> On Thursday 02 April 2009 09:36:13 Ying Han wrote:
>> Hi Jan:
>> I feel that the problem you saw is kind of differnt than mine. As
>> you mentioned that you saw the PageError() message, which i don't see
>> it on my system. I tried you patch(based on 2.6.21) on my system and
>> it runs ok for 2 days, Still, since i don't see the same error message
>> as you saw, i am not convineced this is the root cause at least for
>> our problem. I am still looking into it.
>> So, are you seeing the PageError() every time the problem happened?
>
> So I asked if you could test with my workaround of taking truncate_mutex
> at the start of ext2_get_blocks, and report back. I never heard of any
> response after that.

I applied the change and still get the same issue, unless i didn't do
the right thing, here
is the patch i applied, which put the truncate_mutex at the beginning
of ext2_get_blocks.

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 384fc0d..94cf773 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -586,10 +586,13 @@ static int ext2_get_blocks(struct inode *inode,
int count = 0;
ext2_fsblk_t first_block = 0;

+ mutex_lock(&ei->truncate_mutex);
depth = ext2_block_to_path(inode,iblock,offsets,&blocks_to_boundary);

- if (depth == 0)
+ if (depth == 0) {
+ mutex_unlock(&ei->truncate_mutex);
return (err);
+ }
reread:
partial = ext2_get_branch(inode, depth, offsets, chain, &err);

@@ -625,7 +628,7 @@ reread:
if (!create || err == -EIO)
goto cleanup;

- mutex_lock(&ei->truncate_mutex);

/*
* Okay, we need to do block allocation. Lazily initialize the block
@@ -651,7 +654,7 @@ reread:
offsets + (partial - chain), partial);

if (err) {
- mutex_unlock(&ei->truncate_mutex);
goto cleanup;
}

@@ -662,13 +665,13 @@ reread:
err = ext2_clear_xip_target (inode,
le32_to_cpu(chain[depth-1].key));
if (err) {
- mutex_unlock(&ei->truncate_mutex);
goto cleanup;
}
}

ext2_splice_branch(inode, iblock, partial, indirect_blks, count);
- mutex_unlock(&ei->truncate_mutex);
set_buffer_new(bh_result);
got_it:
map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key));
@@ -678,6 +681,7 @@ got_it:
/* Clean up and exit */
partial = chain + depth - 1; /* the whole chain */
cleanup:
+ mutex_unlock(&ei->truncate_mutex);
while (partial > chain) {
brelse(partial->bh);
partial--;

--Ying

>
> To reiterate: I was able to reproduce a problem with ext2 (I was testing
> on brd to get IO rates high enough to reproduce it quite frequently).
> I think I narrowed the problem down to block allocation or inode block
> tree corruption because I was unable to reproduce it with that hack in
> place.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/