Re: 2.6.11: kernel BUG at fs/jbd/checkpoint.c:247! - Also on2.6.12-rc5

From: Andrew Morton
Date: Tue Jun 14 2005 - 15:04:32 EST


John Baboval <baboval@xxxxxxxxxxxxx> wrote:
>
> On Fri, 3 Jun 2005 16:33:56 -0700 Andrew Morton <akpm@xxxxxxxx> wrote:
> > Please test
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.12-rc5.tar.bz2
> > plus
> > ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.12-rc5-git8.bz2
>
>
> I've just reproduced this on 2.6.12-rc5.
>
> It was hapening every 4 hours or so with 2.6.11.9. 2.6.12-rc5 ran for a week... The system is being used as a fairly high traffic NFS server.

Could you try this, please?


From: Jan Kara <jack@xxxxxxx>

On one path, cond_resched_lock() fails to return true if it dropped the lock.
We think this might be causing the crashes in JBD's log_do_checkpoint().

Cc: Ingo Molnar <mingo@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---

kernel/sched.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff -puN kernel/sched.c~cond_resched-lock-fix kernel/sched.c
--- 25/kernel/sched.c~cond_resched-lock-fix Mon Jun 13 15:51:22 2005
+++ 25-akpm/kernel/sched.c Mon Jun 13 15:51:22 2005
@@ -3755,19 +3755,22 @@ EXPORT_SYMBOL(cond_resched);
*/
int cond_resched_lock(spinlock_t * lock)
{
+ int ret = 0;
+
if (need_lockbreak(lock)) {
spin_unlock(lock);
cpu_relax();
+ ret = 1;
spin_lock(lock);
}
if (need_resched()) {
_raw_spin_unlock(lock);
preempt_enable_no_resched();
__cond_resched();
+ ret = 1;
spin_lock(lock);
- return 1;
}
- return 0;
+ return ret;
}

EXPORT_SYMBOL(cond_resched_lock);
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/