Re: writing file to disk: not as easy as it looks

From: Mikulas Patocka
Date: Tue Dec 02 2008 - 18:26:53 EST


On Tue, 2 Dec 2008, Pavel Machek wrote:

> Actually, it looks like POSIX file interface is on the lowest step of
> Rusty's scale: one that is impossible to use correctly. Yes, it seems
> impossible to reliably&safely write file to disk under Linux. Double
> plus uncool.
>
> So... how to write file to disk and wait for it to reach the stable
> storage, with proper error handling?
>
> > file
>
> ...does not work, because it fails to check for errors.
>
> touch file || error_handling.
>
> Is not a lot better, unless you mount your filesystems "sync"
> ... and noone does that.
>
> dd conv=fsync if=something of=file 2> /dev/zero || error_handling
>
> Is a bit better; not much, unless you mount your filesystems
> "dirsync", because you have file data on disk, but they do not have
> directory entry pointing to them. Noone uses dirsync.
>
> So you need something like
>
> dd conv=fsync if=something of=file 2> /dev/zero || error_handling
> fsync . || error_handling
> fsync .. || error_handling
> fsync ../.. || error_handling
> fsync ../../.. || error_handling
>
> ... which mostly works...
>
> If you are alone on the filesystem... fsync only returns
> errors to the first process. So if you have other process that
> does fsync ., maybe it gets "your" error and you do not learn
> of the problem.
>
> Question is... Is there way that I missed and that actually works?
> Pavel

Hi!

I think you are right about this. There's no way how to fsync directory
reliably.


My idea is that when the filesystem hits metadata write error, it should
stop commiting any transactions and return error to all writes.

Write errors don't happen because of physical errors on media --- all
current disks have sector realocation. Write errors can happen because of
bad cabling, voltage drops, firmware bugs, corruption of PCI bus by rogue
card, etc. Most of these cases are fixable by the administrator.

If you continue operating the filesystem after a write error (it doesn't
matter if you report the error to userspace or not), you are risking
filesystem damage (for example, cross linked files if there was error
writing the bitmap) or security breach (users reading blocks containing
deleted data of other users). If you freeze the filesystem on write error
and do not allow further writes, the administrator can fix the underlying
problem and the computer will run without any data damage and security
problems.


It happened to me just yesterday, my disk was spinning down & up
repeatedly and returning errors because of insufficient power. My kernel
kicked spadfs filesystem off on the first write error and didn't allow any
further commits. I fixed the problem by adding the second power supply and
connecting some disks to it --- and now, after the incident, there are
zero corruptions. Just imagine what massacre would happen on the
filesystem if the kernel didn't kick it off and if it were operting under
the condition "some writes get through - some not" unattended for some
time.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/