sync, reboot, and corrupting data [was Re: 2.6.29 -mm merge plans]

From: Pavel Machek
Date: Sat Jan 10 2009 - 16:30:36 EST


On Sat 2009-01-10 16:07:29, Andi Kleen wrote:
> On Thu, Jan 08, 2009 at 02:24:55PM +0100, Pavel Machek wrote:
> > On Wed 2009-01-07 03:57:25, Andi Kleen wrote:
> > > > sys_sync B which is invoked *after* sys_sync caller A should not
> > > > return before A. If you didn't have a global lock, they'd tend to
> > > > block one another's pages anyway. I think it's OK.
> > >
> > > It means that you cannot reboot because reboot does sync.
> > > What happens when the sync gets stuck somewhere on a really
> > > slow device?
> >
> > And what do you propose? Silently corrupt data on the slow device?
>
> Yes not writing is better than being unable to reboot.

Disagreed.

> There should be always a timeout at least for the reboot case.

Why? If you don't want to sync the data on disk, don't call the sync
syscall.

> Consider it from a uptime perspective: if something is really
> screwed up (and that happens sometimes; classical example
> was the IO stack getting hung up forever in error handling
> loops) the only way to get running again is to reboot and try again.

reboot() syscall should not sync.

sync() syscall should not return unless data are written on disk,
however slow that may be.

maybe reboot utility should not call sync()...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/