Re: [PATCH] was: [RFC] removal of sync in panic

From: Christian Borntraeger
Date: Thu Jul 15 2004 - 00:00:20 EST


Andrew Morton wrote:
> I agree with the patch in principle, but I'd be interested in what
> observed problem motivated it?

see the first posting.

-----------snip--------------
I have seen panic failing two times lately on an SMP system. The box
panic'ed but was running happily on the other cpus. The culprit of this
failure is the fact, that these panics have been caused by a block device
or a filesystem (e.g. using errors=panic). In these cases the likelihood
of a failure/hang of sys_sync() is high. This is exactly what happened in
both cases I have seen. Meanwhile the other cpus are happily continuing
destroying data as the kernel has a severe problem but its not aware of
that as smp_send_stop happens after sys_sync.

I can imagine several changes but I am not sure if this is a problem which
must be fixed and which fix is the best.
Here are my alternatives:

1. remove sys_sync completely: syslogd and klogd use fsync. No need to help
them. Furthermore we have a severe problem which is worth a panic, so we
better dont do any I/O.
2. move smp_send_stop before sys_sync. This at least prevents other cpus of
doing harm if sys_sync hangs. Here I am not sure if this is really working.
3. Add an
if (doing_io())
printk(KERN_EMERG "In I/O routine - not syncing\n");
check like in_interrupt check. Unfortunately I have no clue how this can be
achieved and it looks quite ugly.
---------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/