Re: [PATCH 1/3] glibc: Perform rseq registration at C startup and thread creation (v22)

From: Carlos O'Donell
Date: Fri Jul 03 2020 - 13:51:09 EST


On 7/2/20 10:46 AM, Florian Weimer wrote:
> * Mathieu Desnoyers via Libc-alpha:
>
>> Register rseq TLS for each thread (including main), and unregister for
>> each thread (excluding main). "rseq" stands for Restartable Sequences.
>>
>> See the rseq(2) man page proposed here:
>> https://lkml.org/lkml/2018/9/19/647
>>
>> Those are based on glibc master branch commit 3ee1e0ec5c.
>> The rseq system call was merged into Linux 4.18.
>>
>> The TLS_STATIC_SURPLUS define is increased to leave additional room for
>> dlopen'd initial-exec TLS, which keeps elf/tst-auditmany working.
>>
>> The increase (76 bytes) is larger than 32 bytes because it has not been
>> increased in quite a while. The cost in terms of additional TLS storage
>> is quite significant, but it will also obscure some initial-exec-related
>> dlopen failures.
>
> We need another change to get this working on most non-x86
> architectures:
>
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 817bcbbf59..ca13778ca9 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -134,6 +134,12 @@ void
> _dl_determine_tlsoffset (void)
> {
> size_t max_align = TLS_TCB_ALIGN;
> + /* libc.so with rseq has TLS with 32-byte alignment. Since TLS is
> + initialized before audit modules are loaded and slotinfo
> + information is available, this is not taken into account below in
> + the audit case. */
> + max_align = MAX (max_align, 32U);
> +
> size_t freetop = 0;
> size_t freebottom = 0;
>
> This isn't visible on x86-64 because TLS_TCB_ALIGN is already 64 there.
>
> I plan to re-test with this fix and push the series.
>
> Carlos, is it okay if I fold in the dl-tls.c change if testing looks
> good?

I have reviewed the above and I think it is the correct *pragmatic* fix.

The reality is that to fix this fully you must use a two stage loading
process to pre-examine all audit modules *before* setting the fundamental
alignment of the TCB. This isn't easy with the current loader framework.
Therefore the above is a good pragmatic solution.

There is always going to be a bit of a chicken and an egg situation.
We want to provide a fundamental alignment requirement but we haven't
yet seen all the requirements on alignment. So the best we could do is
look over DT_NEEDED, DT_AUDIT, LD_PRELOAD, etc. get the best answer
and then fail any subsequent dlopen's that load objects with higher
fundamental requirements for alignment of the TCB.

The audit modules are problematic becuase they are loaded *before*
anything else is loaded, *before* we've examined any of the actual
objects we're about to load because they can influence the search
paths. Again, this means the above solution is a perfect pragmatic
choice. The real solution is to rearchitect the early audit module
loading into two stages and that's work we can do later.

OK with the above change.

--
Cheers,
Carlos.