Re: POSIX violation by writeback error

From: Austin S. Hemmelgarn
Date: Wed Sep 05 2018 - 08:07:32 EST


On 2018-09-05 04:37, çæå wrote:
On Wed, Sep 5, 2018 at 4:04 PM Rogier Wolff <R.E.Wolff@xxxxxxxxxxxx> wrote:

On Wed, Sep 05, 2018 at 09:39:58AM +0200, Martin Steigerwald wrote:
Rogier Wolff - 05.09.18, 09:08:
So when a mail queuer puts mail the mailq files and the mail processor
can get them out of there intact, nobody is going to notice. (I know
mail queuers should call fsync and report errors when that fails, but
there are bound to be applications where calling fsync is not
appropriate (*))

AFAIK at least Postfix MDA only reports mail as being accepted over SMTP
once fsync() on the mail file completed successfully. And IÂd expect
every sensible MDA to do this. I donÂt know how Dovecot MDA which I
currently use for sieve support does this tough.


Is every implementation of mail editor really going to call fsync()? Why
they are going to call fsync(), when fsync() is meant to persist the file
on disk which is apparently unnecessary if the delivering to SMTP task
won't start again after reboot?
Not mail clients, the actual servers. If they implement the SMTP standard correctly, they _have_ to call fsync() before they return that an email was accepted for delivery or relaying, because SMTP requires that a successful return means that the system can actually attempt delivery, which is not guaranteed if they haven't verified that it's actually written out to persistent storage.

Yes. That's why I added the remark that mailers will call fsync and know
about it on the write side. I encountered a situation in the last few
days that when a developer runs into this while developing, would have
caused him to write:
/* Calling this fsync causes unacceptable performance */
// fsync (fd);

I know of an application somewhere that does realtime-gathering of
call-records (number X called Y for Z seconds). They come in from a
variety of sources, get de-duplicated standardized and written to
files. Then different output modules push the data to the different
consumers within the company. Billing among them.

Now getting old data there would be pretty bad. And calling fsync
all the time might have performance issues....

That's the situation where "old data is really bad".

But when apt-get upgrade replaces your /bin/sh and gets a write error
returning error on subsequent reads is really bad.

At this point, the /bin/sh may be partially old and partially new. Execute
this corrupted bin is also dangerous though.
But the system may still be usable in that state, while returning an error there guarantees it isn't. This is, in general, not the best example though, because no sane package manager directly overwrites _anything_, they all do some variation on replace-by-rename and call fsync _before_ renaming, so this situation is not realistically going to happen on any real system.