Re: Rseq registration: Google tcmalloc vs glibc

From: Mathieu Desnoyers
Date: Wed Feb 26 2020 - 12:01:52 EST


----- On Feb 25, 2020, at 10:38 PM, Chris Kennelly ckennelly@xxxxxxxxxx wrote:

> On Tue, Feb 25, 2020 at 10:25 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>
>> On Fri, Feb 21, 2020 at 11:13 AM Mathieu Desnoyers
>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>> >
>> > ----- On Feb 21, 2020, at 10:49 AM, Joel Fernandes, Google
>> > joel@xxxxxxxxxxxxxxxxx wrote:
>> >
>> > [...]
>> > >>
>> > >> 3) Use the __rseq_abi TLS cpu_id field to know whether Rseq has been
>> > >> registered.
>> > >>
>> > >> - Current protocol in the most recent glibc integration patch set.
>> > >> - Not supported yet by Linux kernel rseq selftests,
>> > >> - Not supported yet by tcmalloc,
>> > >>
>> > >> Use the per-thread state to figure out whether each thread need to register
>> > >> Rseq individually.
>> > >>
>> > >> Works for integration between a library which exists for the entire lifetime
>> > >> of the executable (e.g. glibc) and other libraries. However, it does not
>> > >> allow a set of libraries which are dlopen'd/dlclose'd to co-exist without
>> > >> having a library like glibc handling the registration present.
>> > >
>> > > Mathieu, could you share more details about why during dlopen/close
>> > > libraries we cannot use the same __rseq_abi TLS to detect that rseq was
>> > > registered?
>> >
>> > Sure,
>> >
>> > A library which is only loaded and never closed during the execution of the
>> > program can let the kernel implicitly unregister rseq at thread exit. For
>> > the dlopen/dlclose use-case, we need to be able to explicitly unregister
>> > each thread's __rseq_abi which sit in a library which is going to be
>> > dlclose'd.
>>
>> Mathieu, Thanks a lot for the explanation, it makes complete sense. It
>> sounds from Chris's reply that tcmalloc already checks
>> __rseq_abi.cpu_id and is not dlopened/closed. Considering these, it
>> seems to already handle things properly - CMIIW.
>
> I'll make a note about this, since we can probably benefit from some
> more comments about the assumptions/invariants the fastpath uses.

I suspect the integration with glibc and with dlopen'd/dlclose'd libraries will not
behave correctly with the current tcmalloc implementation.

Based on the tcmalloc code-base, InitFastPerCpu is only called from IsFast. As long
as this is the only expected caller, having IsFast comparing the RseqCpuId detects
whether glibc (or some other library) has already registered rseq for the current
thread.

However, if the application chooses to invoke InitFastPerCpu() directly, things become
expected, because it invokes:

absl::base_internal::LowLevelCallOnce(&init_per_cpu_once, InitPerCpu);

which AFAIU invokes InitPerCpu once after execution of the current program. Which
does:

static bool InitThreadPerCpu() {
if (__rseq_refcount++ > 0) {
return true;
}

auto ret = syscall(__NR_rseq, &__rseq_abi, sizeof(__rseq_abi), 0,
PERCPU_RSEQ_SIGNATURE);
if (ret == 0) {
return true;
} else {
__rseq_refcount--;
}

return false;
}

static void InitPerCpu() {
// Based on the results of successfully initializing the first thread, mark
// init_status to initialize all subsequent threads.
if (InitThreadPerCpu()) {
init_status = kFastMode;
}
}

In a scenario where glibc has already registered Rseq, the __rseq_refcount will
be incremented, the __NR_rseq syscall will fail with -1, errno=EBUSY, so the refcount
will be immediately decremented and it will return false. Therefore, "init_status" will
never be set fo kFastMode, leaving it in kSlowMode for the entire lifetime of this
program. That being said, even though this state can come as a surprise, it seems to
be entirely bypassed by the fast-paths IsFast() and IsFastNoInit(), so maybe it won't
have any observable side-effects other than leaving init_status in a state that does not
match reality.

In the other use-case where tcmalloc co-exist with a dlopened/dlclosed library, but glibc
does not provide Rseq registration, we run into issues as well if the dlopened library
registers rseq first for a given thread. The IsFastNoInit() expects that if Rseq has been
observed as registered in the past for a thread, it stays registered. However, if a
dlclosed library unregisters Rseq, we need to be prepared to re-register it. So either
tcmalloc needs to express its use of Rseq by incrementing __rseq_refcount even when Rseq
is registered (this would hurt the fast-path however, and I would hate to have to do this),
or tcmalloc needs to be able to handle the fact that Rseq may be unregistered by a dlclosed
library which was the actual owner of the Rseq registration.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com