Re: [PATCH 0/4] nolibc: add support for the s390 platform

From: Willy Tarreau
Date: Tue Jan 10 2023 - 12:55:03 EST


On Tue, Jan 10, 2023 at 08:32:10AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 10, 2023 at 05:12:49PM +0100, Willy Tarreau wrote:
> > On Tue, Jan 10, 2023 at 06:53:34AM -0800, Paul E. McKenney wrote:
> > > Here is one of them, based on both the fixes and Sven's s390 support.
> > > Please let me know if you need any other combination.
> >
> > Thanks, here's the problem:
> >
> > > 0 getpid = 1 [OK]
> > > 1 getppid = 0 [OK]
> > > 3 gettid = 1 [OK]
> > > 5 getpgid_self = 0 [OK]
> > > 6 getpgid_bad = -1 ESRCH [OK]
> > > 7 kill_0[ 1.940442] tsc: Refined TSC clocksource calibration: 2399.981 MHz
> > > [ 1.942334] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x229825a5278, max_idle_ns: 440795306804 ns
> > > = 0 [OK]
> > > 8 kill_CONT = 0 [ 1.944987] clocksource: Switched to clocksource tsc
> > > [OK]
> > > 9 kill_BADPID = -1 ESRCH [OK]
> > (...)
> >
> > It's clear that "grep -c ^[0-9].*OK" will not count all of them (2 are
> > indeed missing).
> >
> > We could probably start with "quiet" but that would be against the
> > principle of using this to troubleshoot issues. I think we just stick
> > to the current search of "FAIL" and that as long as a success is
> > reported and the number of successes is within the expected range
> > that could be OK. At least I guess :-/
>
> Huh. Would it make sense to delay the start of the nolibc testing by a
> few seconds in order to avoid this sort of thing? Or would that cause
> other problems?

That would be quite annoying. Delaying is never long enough for some
issues, too long for the majority of cases where there is no issue. I'd
suggest that we just rely on the fail count for now (as it is) and that
will allow us to collect a larger variety of discrepancies and probably
figure a better solution at some point. For example if we find that it's
always the TSC that does this, maybe starting x86 with notsc will be a
good fix.

Regards,
Willy