Re: [RFC PATCH 0/6] Deduplicating RISCV cmpxchg.h macros

From: Leonardo Bras Soares Passos
Date: Tue Mar 21 2023 - 23:44:03 EST


On Tue, Mar 21, 2023 at 4:48 PM Conor Dooley <conor@xxxxxxxxxx> wrote:
>
> On Tue, Mar 21, 2023 at 02:01:59PM -0300, Leonardo Bras Soares Passos wrote:
> > On Tue, Mar 21, 2023 at 4:49 AM Conor Dooley <conor.dooley@xxxxxxxxxxxxx> wrote:
> > > On Tue, Mar 21, 2023 at 03:30:35AM -0300, Leonardo Brás wrote:
> > > > On Sun, 2023-03-19 at 20:35 +0000, Conor Dooley wrote:
> > > > > On Sat, Mar 18, 2023 at 05:00:54AM -0300, Leonardo Bras wrote:
> > > > > > While studying riscv's cmpxchg.h file, I got really interested in
> > > > > > understanding how RISCV asm implemented the different versions of
> > > > > > {cmp,}xchg.
> > > > > >
> > > > > > When I understood the pattern, it made sense for me to remove the
> > > > > > duplications and create macros to make it easier to understand what exactly
> > > > > > changes between the versions: Instruction sufixes & barriers.
> > > > > >
> > > > > > I split those changes in 3 levels for each cmpxchg and xchg, resulting a
> > > > > > total of 6 patches. I did this so it becomes easier to review and remove
> > > > > > the last levels if desired, but I have no issue squashing them if it's
> > > > > > better.
> > > > > >
> > > > > > Please provide comments.
> > > > > >
> > > > > > Thanks!
> > > > > > Leo
> > > > > >
> > > > > > Leonardo Bras (6):
> > > > > > riscv/cmpxchg: Deduplicate cmpxchg() asm functions
> > > > > > riscv/cmpxchg: Deduplicate cmpxchg() macros
> > > > > > riscv/cmpxchg: Deduplicate arch_cmpxchg() macros
> > > > >
> > > > > > riscv/cmpxchg: Deduplicate xchg() asm functions
> > > > >
> > > > > FWIW, this patch seems to break the build pretty badly:
> > > > > https://patchwork.kernel.org/project/linux-riscv/patch/20230318080059.1109286-5-leobras@xxxxxxxxxx/
> > > >
> > > > Thanks for pointing out!
> > > >
> > > > It was an intermediary error:
> > > > Sufix for amoswap on acquire version was "d.aqrl" instead of the
> > > > correct".d.aqrl", and that caused the fail.
> > > >
> > > > I did not notice anything because the next commit made it more general, and thus
> > > > removed this line of code. I will send a v2-RFC shortly.
> > > >
> > > > I see that patch 4/6 has 5 fails, but on each one of them I can see:
> > > > "I: build output in /ci/workspace/[...]", or
> > > > ""I: build output in /tmp/[...]".
> > >
> > > I don't push the built artifacts anywhere, just the brief logs -
> > > although the "failed to build" log isn't very helpful other than telling
> > > you the build broke.
> > > That seems like a bug w.r.t. where tuxmake prints its output. I'll try
> > > to fix that.
> >
> > Thanks for that :)
>
> I've pushed what I think is a fix, the wrong log file was being grepped
> for errors in the case of a failed build.

Thanks!

>
> > > > I could not find any reference to where this is saved, though.
> > > > Could you point where can I find the error output, just for the sake of further
> > > > debugging?
> > > >
> > > > >
> > > > > Patches 1 & 5 also add quite a lot of sparse issues (like 1000), but I
> > > > > think that may be more of an artifact of the testing process as opposed
> > > > > to something caused by this patchset.
> > > >
> > > > For those I can see the build output diffs. Both added error lines to
> > > > conchuod/build_rv64_gcc_allmodconfig.
> > > > I tried to mimic this with [make allmodconfig + gcc build with
> > > > CROSS_COMPILE=riscv64-linux-gnu- ] but I could not get any error in any patch.
> > >
> > > If you can't replicate them, it's probably because they come from
> > > sparse. I only really mentioned it here in case you went looking at the
> > > other patches in the series and were wondering why things were so amiss.
> >
> > Oh, it makes sense.
> >
> > >
> > > > For patch 1/6 it removes as much error lines (-) as it adds (+), and the error
> > > > messages are mostly the same, apart for an line number.
> > >
> > > I don't see a line number difference, but rather an increase in the
> > > number of times the same error lines have appeared in the output.
> > > I don't find the single-line output from sparse to really be very
> > > helpful, I should probably have a bit of a think how to present that
> > > information better.
> >
> > Oh, I see.
> > The number at the beginning relates to the number of times the error happens.
> > Ok it makes sense to me now :)
> >
> > >
> > > > For patch 5/6 it actually adds many more lines, but tracking (some of) the
> > > > errors gave me no idea why.
> > >
> > > Probably just sparse being unhappy with some way the macros were
> > > changed - but some of it ("Should it be static?" bits) look very much
> > > like the patch causing things to be rebuilt only for the "after the
> > > patch" build, but somehow not for the "before" build.
> >
> > Humm, not sure I could understand that last part:
> > What I get is that the patch 5/6 is causing things to be rebuilt, and
> > it was not like that on 1-4/6.
> > Is that what you said?
> > If so, and you are doing it as an incremental build, changing the
> > macros in 5/6 should be triggering rebuilds, but it does not make
> > sense to not be rebuilt in 1-4/6 as they change the same macros.
>
> Right, it is an incremental build.
> First it builds the tree with a patch applied, then it checks out HEAD~1
> and builds that tree. Then it goes back to the tree with the patch
> applied and builds it again. The output from builds 2 & 3 are compared
> to see if any errors snuck in.
> In theory, that should ensure that, as builds 2 & 3 have had the same
> diff to the previous one just in opposite directions, the same files get
> rebuilt - but I am a little worried that ccache gets involved sometimes
> and leads to the before/after builds not being exactly the same.

Makes sense to me.

>
> They may very well be real issues as your refactor has caused the
> casting in the macros to change or w/e. Not my area of expertise by any
> stretch of the imagination, but the lkp sparse is out of date & doesn't
> run any more so I figure it's better to be running the stuff, even if it
> does sometimes result in a splurge of errors like this than forget about
> poor aul sparse.

I agree with you.
Better deal with sporadic spurious errors than not testing at all.

>
> Cheers,
> Conor.

Best regards,
Leo