Re: [RFC PATCH] posix-timers: Support delivery of signals to the current thread

From: Dmitry Vyukov
Date: Wed Jan 11 2023 - 10:50:54 EST


On Fri, 16 Dec 2022 at 18:18, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> Support CLOCK_PROCESS_CPUTIME_ID timers with SIGEV_SIGNAL | SIGEV_THREAD_ID
> and sigev_notify_thread_id == 0, which sends the signal to the current
> thread that fires the timer. This is useful for relatively uniform sampling
> of process execution across all threads with signals.
>
> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Marco Elver <elver@xxxxxxxxxx>
>
> ---
>
> We are trying to implement sampling program analysis based on this.
> We don't need 100% uniform sampling as a CPU profiler would need,
> but we still need the signals to be reasonably distributed across
> the process threads.
>
> Thomas, does the idea look sane to you overall?
> Are there any existing alternatives to this?

Hi Thomas,

I guess this was lost in your inbox during the holidays period.
Please take a look. Should I mail this as a non-RFC patch?

Thanks

> All alternatives we found are complex and/or have high overhead.
> E.g. we considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
> plus inherited for all threads perf_event(PERF_COUNT_SW_TASK_CLOCK).
> When the timer fires we enable the perf event, and then use the signals
> from the perf event, and then disable the perf event.
> But this has considerable memory overhead (perf event per thread)
> and issues IPIs to all CPUs for perf event enable/disable.
>
> We also considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
> and then manually scan /proc/self/task/ and select some task at random.
> But this is also complex and makes it hard to do reasonable sampling
> proportional to activity of threads.
>
> All alternatives are based on CLOCK_PROCESS_CPUTIME_ID in some way,
> and it seems that just a single CLOCK_PROCESS_CPUTIME_ID timer is enough
> if it could deliver signals to active threads (what this patch is doing).
> The analysis we are trying to do is intended for production systems
> so we are aiming at as low overhead as possible.
>
> If this idea looks sane to you in general, I will add tests and I am open
> to suggestions on the user API (should it be a new SIGEV_CURRENT_THREAD?)
> and on how to represent this in the struct k_itimer. E.g. we could keep
> it_pid=current but add a special flag that says to send the signal to
> the current task rather than it_pid. This has an advantage that we can
> add the following check to posix_timer_event()
> (which would be pretty bad to violate):
> WARN_ON(!same_thread_group(pid_task(timr->it_pid, PIDTYPE_PID), current));
> ---
> kernel/time/posix-timers.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 5dead89308b74..411ba087e0699 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
> int posix_timer_event(struct k_itimer *timr, int si_private)
> {
> enum pid_type type;
> + struct pid *pid;
> int ret;
> /*
> * FIXME: if ->sigq is queued we can race with
> @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
> */
> timr->sigq->info.si_sys_private = si_private;
>
> + pid = timr->it_pid ?: task_pid(current);
> type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
> - ret = send_sigqueue(timr->sigq, timr->it_pid, type);
> + ret = send_sigqueue(timr->sigq, pid, type);
> /* If we failed to send the signal the timer stops. */
> return ret > 0;
> }
> @@ -428,27 +430,31 @@ static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
> return ret;
> }
>
> -static struct pid *good_sigevent(sigevent_t * event)
> +static struct pid *good_sigevent(sigevent_t *event, clockid_t which_clock)
> {
> struct pid *pid = task_tgid(current);
> struct task_struct *rtn;
>
> switch (event->sigev_notify) {
> case SIGEV_SIGNAL | SIGEV_THREAD_ID:
> + /* This will use the current task for signals. */
> + if (which_clock == CLOCK_PROCESS_CPUTIME_ID &&
> + !event->sigev_notify_thread_id)
> + return NULL;
> pid = find_vpid(event->sigev_notify_thread_id);
> rtn = pid_task(pid, PIDTYPE_PID);
> if (!rtn || !same_thread_group(rtn, current))
> - return NULL;
> + return ERR_PTR(-ENOENT);
> fallthrough;
> case SIGEV_SIGNAL:
> case SIGEV_THREAD:
> if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> - return NULL;
> + return ERR_PTR(-EINVAL);
> fallthrough;
> case SIGEV_NONE:
> return pid;
> default:
> - return NULL;
> + return ERR_PTR(-EINVAL);
> }
> }
>
> @@ -502,6 +508,7 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
> struct k_itimer *new_timer;
> int error, new_timer_id;
> int it_id_set = IT_ID_NOT_SET;
> + struct pid *pid;
>
> if (!kc)
> return -EINVAL;
> @@ -527,9 +534,11 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
>
> if (event) {
> rcu_read_lock();
> - new_timer->it_pid = get_pid(good_sigevent(event));
> + pid = good_sigevent(event, which_clock);
> + if (!IS_ERR(pid))
> + new_timer->it_pid = get_pid(pid);
> rcu_read_unlock();
> - if (!new_timer->it_pid) {
> + if (IS_ERR(pid)) {
> error = -EINVAL;
> goto out;
> }
>
> base-commit: 041fae9c105ae342a4245cf1e0dc56a23fbb9d3c
> --
> 2.39.0.314.g84b9a713c41-goog
>