[PATCH] tracing: Fix wraparound problems in "uptime" tracer

From: Tony Luck
Date: Mon Jun 30 2014 - 14:17:27 EST


There seem to be no non-racy solutions ... I've been wondering
about giving up on a generic jiffies_to_nsec() function because
people might use it in cases where the races might be likley to
bite them. For my need, I think that "perfect is the enemy of good":

1) The race window is only a few microseconds wide
2) It only exists on 32-bit kernels - which are dying out on server
systems because they can't handle the amounts of memory on modern
machines.
3) It opens every 49 days (on a HZ=1000 system)
4) I'm logging error events that happen at a "per-month" frequency (or lower)
5) If the race does happen - the visible result is that we have a
bad time logged against an error event.

so what about this: ...

From: Tony Luck <tony.luck@xxxxxxxxx>

The "uptime" tracer added in:
commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

We can't provide a full features jiffies_to_nsec() function in
any safe way (32-bit systems need locking to read the full 64-bit
jiffies value). Just do the best we can here and recognise that
32-bit systems may seem some timestamp anomolies if jiffies64
was in the middle of rolling over a 2^32 boundary.

Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
---
kernel/timeconst.bc | 6 ++++++
kernel/trace/trace_clock.c | 10 ++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/timeconst.bc b/kernel/timeconst.bc
index 511bdf2cafda..a5fef7a7fb27 100644
--- a/kernel/timeconst.bc
+++ b/kernel/timeconst.bc
@@ -100,6 +100,12 @@ define timeconst(hz) {
print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
print "\n"

+ obase=10
+ cd=gcd(hz,1000000000)
+ print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
+ print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
+ print "\n"
+
print "#endif /* KERNEL_TIMECONST_H */\n"
}
halt
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348332b7..dc5b11b9f8a4 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -59,13 +59,19 @@ u64 notrace trace_clock(void)

/*
* trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * This usage of jiffies_64 isn't safe on 32-bit, but we may be
+ * called from NMI context, and we have no safe way to get a timestamp.
*/
u64 notrace trace_clock_jiffies(void)
{
- u64 jiffy = jiffies - INITIAL_JIFFIES;
+ u64 jiffy = jiffies_64 - INITIAL_JIFFIES;

/* Return nsecs */
- return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+#if !(NSEC_PER_SEC % HZ)
+ return (NSEC_PER_SEC / HZ) * jiffy;
+#else
+ return (jiffy * HZ_TO_NSEC_NUM) / HZ_TO_NSEC_DEN;
+#endif
}

/*
--
1.8.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/