[PATCH] Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009

From: Chris Adams
Date: Fri Jan 02 2009 - 23:41:58 EST


Once upon a time, Duane Griffin <duaneg@xxxxxxxxx> said:
> On Fri, Jan 02, 2009 at 06:21:14PM -0600, Chris Adams wrote:
> > In any case, the quick-n-dirty fix would be to not try to printk while
> > holding xtime_lock (I think the NTP code is the only thing that does).
> > However, it would be nice to still get the leap second notification, so
> > some other fix would be better I guess.
>
> How about just moving the printk out of the lock? I.e. something like
> this:

Well, you've only fixed the inserting a leap second case, not the
removing a leap second case. AFAIK we've never actually had a leap
second removed, but it could happen (and the code is already there), so
it should be fixed as well.

Also, I didn't notice the locking was right there in the ntp_leap_second
function in the 2.6.26.6 kernel I was looking at, because I've also been
looking at the 2.6.9-based RHEL 4 kernel (which is a good bit different;
the lock is held outside the function, so it wouldn't be easy to drop it
for the printk). I guess that's Red Hat's (and other long-term support
vendors') problem. The simplest thing for them is still probably to
just remove the printks.

Here's a patch that moves both prinkts outside the lock. I am unable to
make a kernel with this patch crash on a leap second insertion or
deletion.
--
Chris Adams <cmadams@xxxxxxxxxx>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.


From: Chris Adams <cmadams@xxxxxxxxxx>

The code to handle leap seconds printks an information message when the
second is inserted or deleted. It does this while holding xtime_lock.
However, printk wakes up klogd, and in some cases, the scheduler tries
to get the current kernel time, trying to get xtime_lock (which results
in a deadlock). This moved the printks outside of the lock.

Signed-off-by: Chris Adams <cmadams@xxxxxxxxxx>
---
diff -urpN linux-2.6.28-git5-vanilla/kernel/time/ntp.c linux-2.6.28-git5/kernel/time/ntp.c
--- linux-2.6.28-git5-vanilla/kernel/time/ntp.c 2009-01-02 22:09:34.000000000 -0600
+++ linux-2.6.28-git5/kernel/time/ntp.c 2009-01-02 22:11:23.000000000 -0600
@@ -130,6 +130,7 @@ void ntp_clear(void)
static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
{
enum hrtimer_restart res = HRTIMER_NORESTART;
+ int msg = 0;

write_seqlock(&xtime_lock);

@@ -140,8 +141,7 @@ static enum hrtimer_restart ntp_leap_sec
xtime.tv_sec--;
wall_to_monotonic.tv_sec++;
time_state = TIME_OOP;
- printk(KERN_NOTICE "Clock: "
- "inserting leap second 23:59:60 UTC\n");
+ msg = 1;
hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC);
res = HRTIMER_RESTART;
break;
@@ -150,8 +150,7 @@ static enum hrtimer_restart ntp_leap_sec
time_tai--;
wall_to_monotonic.tv_sec--;
time_state = TIME_WAIT;
- printk(KERN_NOTICE "Clock: "
- "deleting leap second 23:59:59 UTC\n");
+ msg = 2;
break;
case TIME_OOP:
time_tai++;
@@ -166,6 +165,17 @@ static enum hrtimer_restart ntp_leap_sec

write_sequnlock(&xtime_lock);

+ switch (msg) {
+ case 1:
+ printk(KERN_NOTICE "Clock: "
+ "inserting leap second 23:59:60 UTC\n");
+ break;
+ case 2:
+ printk(KERN_NOTICE "Clock: "
+ "deleting leap second 23:59:59 UTC\n");
+ break;
+ }
+
return res;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/