Re: EXT4 ENOSPC Bug

From: Andres Freund
Date: Mon Dec 01 2008 - 15:16:27 EST


Hi Andreas, Hi all,

On Monday 01 December 2008 20:42:04 Andreas Dilger wrote:
> On Dec 01, 2008 13:34 +0100, Andres Freund wrote:
> > Are there any additional informations you could use?
> > The filesystem is a bit big for you to download unfortunately (and the
> > testdata it contains a bit to sensitive).
> - did you run "e2fsck -f" to see if there were any errors in the
> filesystem?
Yes, no errors.

> - do you run any specific applications that seem to trigger the
> problem (e.g. Vuze (formerly azureus) as was reported by another user)
No. The Problem occured sometimes shortly after boot (Once even before X was
up) sometimes only after days of full usage.
Boot starts postgres, thats the only thing that propably always ran. But I
have seen it while postgres was idle - i.e. just some polls and stats.

> - do the applications writing to this file have any unusual IO pattern
> (e.g. mmap IO, lots of write+truncate+write on the same file, etc)
Its not a single file, but the whole filesystem, having problems.

> We discussed the creation of a debugging patch to help diagnose this
> problem. It looks like you have already compiled your own kernel, so
> I assume it would be possible for you to run with an additional patch?
Sure, no problem.
At least if the patch is not expensive during run time. The system produces
test-data for testing our development software and I cant take it down for too
long.
Thats the reason why I can use something like ext4 on such a system at all ;-)

It sometimes takes quite a while to reproduce the problem though. The system
ran stable for 4 days so far.
Same kernel as the last time the error occured though.

Any specific debug options I should enable?

Andres

Attachment: signature.asc
Description: This is a digitally signed message part.