Re: [PATCH 0/4] nolibc: add support for the s390 platform

From: Willy Tarreau
Date: Wed Jan 11 2023 - 01:51:56 EST


On Wed, Jan 11, 2023 at 07:45:05AM +0100, Sven Schnelle wrote:
> "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes:
>
> > On Tue, Jan 10, 2023 at 05:12:49PM +0100, Willy Tarreau wrote:
> >> On Tue, Jan 10, 2023 at 06:53:34AM -0800, Paul E. McKenney wrote:
> >> > Here is one of them, based on both the fixes and Sven's s390 support.
> >> > Please let me know if you need any other combination.
> >>
> >> Thanks, here's the problem:
> >>
> >> > 0 getpid = 1 [OK]
> >> > 1 getppid = 0 [OK]
> >> > 3 gettid = 1 [OK]
> >> > 5 getpgid_self = 0 [OK]
> >> > 6 getpgid_bad = -1 ESRCH [OK]
> >> > 7 kill_0[ 1.940442] tsc: Refined TSC clocksource calibration: 2399.981 MHz
> >> > [ 1.942334] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x229825a5278, max_idle_ns: 440795306804 ns
> >> > = 0 [OK]
> >> > 8 kill_CONT = 0 [ 1.944987] clocksource: Switched to clocksource tsc
> >> > [OK]
> >> > 9 kill_BADPID = -1 ESRCH [OK]
> >> (...)
> >>
> >> It's clear that "grep -c ^[0-9].*OK" will not count all of them (2 are
> >> indeed missing).
> >>
> >> We could probably start with "quiet" but that would be against the
> >> principle of using this to troubleshoot issues. I think we just stick
> >> to the current search of "FAIL" and that as long as a success is
> >> reported and the number of successes is within the expected range
> >> that could be OK. At least I guess :-/
> >
> > Huh. Would it make sense to delay the start of the nolibc testing by a
> > few seconds in order to avoid this sort of thing? Or would that cause
> > other problems?
>
> Or define a second serial port (or something similar) in qemu and run the
> kernel console on ttyS0, and the init process writes to /dev/ttyS1? So the
> output of the test program could be redirected to a file on the host?

That could be an option I haven't thought about, but it could also make
the collect a bit less convenient. Also, init executes on the main console
by default and I'm not sure how we can redirect it to a different one
before the initial execve() (so that we're certain not to miss anything).

I've seen situations where I was happy to have the two together, when a
call you perform causes an oops, it's much easier when the oops immediately
follows the test name.

I'm still inclined to think that it's probably just the tsc that might be
an annoyance at the moment as it's reported after the startup. If that
continues like this we could also imagine reconfiguring the console to
start logging at level 4 minimum and not have it there anymore for example.

thanks,
Willy