Re: [PATCH] proc: make high precision system boot time available

From: Eric W. Biederman
Date: Sat Jun 29 2013 - 10:24:37 EST



Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

The pach does not solve the problem described.

Sami Kerola <kerolasa@xxxxxx> writes:

> The kernel does not expose precise start time anywhere. Precision of
> sysinfo() is limited to a second. The /proc/uptime has precision of two
> decimals. Neither are good enough when klog messages are displayed with
> a log time stamp that is converted to human understandable format, such
> as ISO-8601. The problem is that imprecise knowledge about uptime will
> lead human readable klog time stamps to wiggle in user commands, such as
> dmesg(1).

Using an adjusted idea like boottime will also cause wiggle whenever the
clock is set, or we have a leap second. So I can't see this solving
anything.

> Additionally the current dmesg and journalctl(1) are disagreeing when
> kernel events occurred. The dmesg is using sysinfo(2) to determine boot
> time, while journalctl is using systemd(1)'s early clock_gettime(2) call.
> Here is demonstration how different baseline effects on a random laptop.
>
> $ dmesg --ctime | grep 'Initializing cgroup subsys cpuset'
> [Fri Jun 28 07:40:12 2013] Initializing cgroup subsys cpuset
> $ journalctl -b | grep 'Initializing cgroup subsys cpuset'
> Jun 28 07:40:21 kerolasa-home kernel: Initializing cgroup subsys cpuset

Here you are show nearly 10 seconds of difference in time. Subsecond
accuracy is not likely to help you here.

> To solve the wiggling problem, and to have a boot time coming from source
> which everyone can agree is 'good enough' estimation of system start up,
> the kernel needs to tell the boot time as precisily as it can.

Quite frankly none of this makes a lick of sense to me. Your deep
complaint is a human pretty printer is not matching the results of
another human pretty printer.

I fail to see how the limitations of pretty printers justifies
maintiaing yet another kernel interface. Especially for something so
dubious and fluctuating as the time we think the wall clock read when we
started up.

I can't imagine this interface doing anything except generating more
user complaints. A made up number with more digits after the decimal
point remains a made up number.

Eric


> CC: Karel Zak <kzak@xxxxxxxxxx>
> CC: Lennart Poettering <lennart@xxxxxxxxxxxxxx>
> CC: Kay Sievers <kay@xxxxxxxx>
> CC: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
> References: http://www.spinics.net/lists/util-linux-ng/msg08092.html
> Measurement: https://plus.google.com/u/0/117341944650953092607/posts/SHy5qu1HpSU
> Signed-off-by: Sami Kerola <kerolasa@xxxxxx>
> ---
> Documentation/ABI/testing/procfs-boottime | 15 ++++++++++++++
> fs/proc/Makefile | 1 +
> fs/proc/boottime.c | 33 +++++++++++++++++++++++++++++++
> 3 files changed, 49 insertions(+)
> create mode 100644 Documentation/ABI/testing/procfs-boottime
> create mode 100644 fs/proc/boottime.c
>
> diff --git a/Documentation/ABI/testing/procfs-boottime b/Documentation/ABI/testing/procfs-boottime
> new file mode 100644
> index 0000000..d5bf8f2
> --- /dev/null
> +++ b/Documentation/ABI/testing/procfs-boottime
> @@ -0,0 +1,15 @@
> +What: /proc/boottime
> +Date: June 2013
> +KernelVersion: 3.11
> +Contact: Sami Kerola <kerolasa@xxxxxx>
> +
> +Description: Show high precision time stamp when linux kernel thinks
> + the system start up. The value is suitable to be used as
> + struct timespec.
> +
> + struct timespec ts;
> + hpu = fopen("/sys/kernel/boottime", "r");
> + fscanf(hpu, "%ld.%ld", &ts.tv_sec, &ts.tv_nsec);
> +
> +Users: At least utilities that print klog entries in
> + times stamps converted for humans.
> diff --git a/fs/proc/Makefile b/fs/proc/Makefile
> index ab30716..5a7ec5a 100644
> --- a/fs/proc/Makefile
> +++ b/fs/proc/Makefile
> @@ -10,6 +10,7 @@ proc-$(CONFIG_MMU) := task_mmu.o
> proc-y += inode.o root.o base.o generic.o array.o \
> fd.o
> proc-$(CONFIG_TTY) += proc_tty.o
> +proc-y += boottime.o
> proc-y += cmdline.o
> proc-y += consoles.o
> proc-y += cpuinfo.o
> diff --git a/fs/proc/boottime.c b/fs/proc/boottime.c
> new file mode 100644
> index 0000000..f56c624
> --- /dev/null
> +++ b/fs/proc/boottime.c
> @@ -0,0 +1,33 @@
> +#include <linux/fs.h>
> +#include <linux/init.h>
> +#include <linux/proc_fs.h>
> +#include <linux/seq_file.h>
> +#include <linux/time.h>
> +
> +static int boottime_proc_show(struct seq_file *m, void *v)
> +{
> + struct timespec ts;
> + getboottime(&ts);
> + seq_printf(m, "%ld.%ld\n", ts.tv_sec, ts.tv_nsec);
> + return 0;
> +}
> +
> +static int boottime_proc_open(struct inode *inode, struct file *file)
> +{
> + return single_open(file, boottime_proc_show, NULL);
> +}
> +
> +static const struct file_operations boottime_proc_fops = {
> + .open = boottime_proc_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = single_release,
> +};
> +
> +static int __init proc_boottime_init(void)
> +{
> + proc_create("boottime", 0, NULL, &boottime_proc_fops);
> + return 0;
> +}
> +
> +module_init(proc_boottime_init);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/