Re: [PATCH v2 3/3] X86: Add a thread cpu time implementation to vDSO

From: Shaohua Li
Date: Mon Jan 05 2015 - 18:24:24 EST


On Fri, Jan 02, 2015 at 09:47:29AM -0800, Andy Lutomirski wrote:
> On Thu, Jan 1, 2015 at 6:59 PM, Shaohua Li <shli@xxxxxx> wrote:
> > On Fri, Dec 19, 2014 at 06:03:34PM +0100, Peter Zijlstra wrote:
> >> On Fri, Dec 19, 2014 at 08:48:07AM -0800, Andy Lutomirski wrote:
> >> > On Fri, Dec 19, 2014 at 3:23 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >> > > On Thu, Dec 18, 2014 at 04:22:59PM -0800, Andy Lutomirski wrote:
> >> > >> Bad news: this patch is incorrect, I think. Take a look at
> >> > >> update_rq_clock -- it does fancy things involving irq time and
> >> > >> paravirt steal time. So this patch could result in extremely
> >> > >> non-monotonic results.
> >> > >
> >> > > Yeah, I'm not sure how (and if) we could make all that work :/
> >> >
> >> > I obviously can't comment on what Facebook needs, but if I were
> >> > rigging something up to profile my own code*, I'd want a count of
> >> > elapsed time, including user, system, and probably interrupt as well.
> >> > I would probably not want to count time during which I'm not
> >> > scheduled, and I would also probably not want to count steal time.
> >> > The latter makes any implementation kind of nasty.
> >> >
> >> > The API presumably doesn't need to be any particular clock id for
> >> > clock_gettime, and it may not even need to be clock_gettime at all.
> >> >
> >> > Is perf self-monitoring good enough for this? If not, can we make it
> >> > good enough?
> >>
> >> Yeah, I think you should be able to use that. You could count a NOP
> >> event and simply use its activated time. We have PERF_COUNT_SW_DUMMY for
> >> such purposes iirc.
> >>
> >> The advantage of using perf self profiling is that it (obviously)
> >> extends to more than just walltime.
> >
> > Hi Peter & Andy,
> > I'm wondering how we could use the perf to implament a clock_gettime.
> > reading the perf fd or using ioctl is slow so reading the mmap
> > ringbuffer is the only option. But as far as I know the ringbuffer has
> > data only when an event is generated. Between two events, there is
> > nothing we can read from the ringbuffer. Then how can application get
> > time info in the interval?
>
> Don't use the ringbuffer. Instead, use a counting event, mmap it, and
> look at struct perf_event_mmap_page's comments to see how to read the
> time stamps.
>
> There's some code here that does this:
>
> https://github.com/andikleen/pmu-tools
>
> but you won't actually need the rdpmc part, since you just want
> overall times instead of hardware event counts.

Good, it works. But the timestamp (.time_running and friends) only gets
updated for real hardware event between context switches. For software
event, the timestamp is initialized once, then never updated. If I use
it to get time, I actually get CLOCK_MONOTONIC. Hardware events work
well here, but depending on hardware event is too tricky, which I'd like
to avoid.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/