Re: [RFC] PATCH 0/4 - Time virtualization
From: Eric W. Biederman
Date: Wed Apr 19 2006 - 04:27:04 EST
Jeff Dike <jdike@xxxxxxxxxxx> writes:
> This set of patches implements
> time virtualization by creating a time namespace
> an interface to it through unshare
> a ptrace extension to allow UML to take advantage of this
> UML support
> The guts of the namespace is just an offset from the system time. Within
> the container, gettimeofday adds this offset to the system time. settimeofday
> changes the offset without touching the system time. As such, within a
> namespace, settimeofday is unprivileged.
> The interface to it is through unshare(CLONE_TIME). This creates the new
> namespace, initialized with a zero offset from the system time.
> The advantage of this for UML is that it can create a time namespace for itself
> and subsequently let its process' gettimeofday run on the host, without
> being intercepted and run inside UML. As such, it should basically run at
> native speed.
> In order to allow this, we need selective system call interception. The
> third patch implements PTRACE_SYSCALL_MASK, which specifies, through a
> bitmask, which system calls are intercepted and which aren't.
That patch should probably be separated, from the rest.
But it looks like a fairly sane idea.
> Finally, the UML support is straightforward. It calls unshare(CLONE_TIME)
> to create the new namespace, sets gettimeofday to run without being
> intercepted, and makes settimeofday call the host's settimeofday instead
> of maintaining the time offset itself.
I think you missed a couple essential things to a time namespace.
Timers. The posix timers, in particular. The worst
of those is the monotonic timer.
In the case of migration the ugly case to properly handle is the
monotonic timer. That needs an offset yet it is absolutely forbidden
to provide that offset from the inside. So this is the one namespace
that I think is inappropriate to use sys_unshare to create.
We need a system call so that we can specify the minimum or the
starting monotonic time base.
I don't know how we want to describe time while a process is not
inside of a kernel.
> As expected, a gettimeofday loop runs basically at native speed. The two
> quick tests I did had it running inside UML at 98.8 and 99.2 % of
> BUG - as I was writing this, I realized that refcounting of the time_ns
> structures is wrong - they need to be incremented at process creation and
> decremented at process exit.
Actually the more I think of the using PTRACE to help with some of
these issues the more I like it. It's only real alternative is a
security module, and that must be written as kernel code.
The reference counting is terrible, as you don't free syscall_mask,
As a comparison what is the overhead if you don't use syscall_mask,
and just do a ptrace_cont on the system call you want to let through?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/