Re: stable-rc/linux-5.6.y bisection: baseline.dmesg.crit on bcm2837-rpi-3-b

From: Guillaume Tucker
Date: Thu Jun 25 2020 - 02:50:39 EST


On 25/06/2020 06:24, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis *
> * that you may be involved with the breaking commit it has *
> * found. No manual investigation has been done to verify it, *
> * and the root cause of the problem may be somewhere else. *
> * *
> * If you do send a fix, please include this trailer: *
> * Reported-by: "kernelci.org bot" <bot@xxxxxxxxxxxx> *
> * *
> * Hope this helps! *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>
> stable-rc/linux-5.6.y bisection: baseline.dmesg.crit on bcm2837-rpi-3-b
>
> Summary:
> Start: 61aba373f570 Linux 5.6.19
> Plain log: https://storage.kernelci.org/stable-rc/linux-5.6.y/v5.6.19/arm64/defconfig/gcc-8/lab-baylibre/baseline-bcm2837-rpi-3-b.txt
> HTML log: https://storage.kernelci.org/stable-rc/linux-5.6.y/v5.6.19/arm64/defconfig/gcc-8/lab-baylibre/baseline-bcm2837-rpi-3-b.html
> Result: 9cf5d5444c78 Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
>
> Checks:
> revert: PASS
> verify: PASS
>
> Parameters:
> Tree: stable-rc
> URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> Branch: linux-5.6.y
> Target: bcm2837-rpi-3-b
> CPU arch: arm64
> Lab: lab-baylibre
> Compiler: gcc-8
> Config: defconfig
> Test case: baseline.dmesg.crit

The "crit" kernel message is:

[ 17.536674] hwmon hwmon1: Undervoltage detected!

which is a known intermittent issue on rpi-3-b that is most
likely to be unrelated to the patch found by this bisection.

Maybe the bisection landed on this patch because it makes the
undervoltage more likely by a side effect, if it affects timing
or the binary size or addresses in very subtle ways. But even
then it wouldn't be a regression.

We should stop running bisections on rpi-3-b, at least for this
kernel error message, until it has been fixed to avoid future
false positives.

Sorry for the noise.

Guillaume


> Breaking commit found:
>
> -------------------------------------------------------------------------------
> commit 9cf5d5444c78c14bb9f90dd21cef361ee14ba64b
> Author: Tejun Heo <tj@xxxxxxxxxx>
> Date: Thu Apr 9 14:55:35 2020 -0400
>
> Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
>
> [ Upstream commit d8ef4b38cb69d907f9b0e889c44d05fc0f890977 ]
>
> This reverts commit 9a9e97b2f1f2 ("cgroup: Add memory barriers to plug
> cgroup_rstat_updated() race window").
>
> The commit was added in anticipation of memcg rstat conversion which needed
> synchronous accounting for the event counters (e.g. oom kill count). However,
> the conversion didn't get merged due to percpu memory overhead concern which
> couldn't be addressed at the time.
>
> Unfortunately, the patch's addition of smp_mb() to cgroup_rstat_updated()
> meant that every scheduling event now had to go through an additional full
> barrier and Mel Gorman noticed it as 1% regression in netperf UDP_STREAM test.
>
> There's no need to have this barrier in tree now and even if we need
> synchronous accounting in the future, the right thing to do is separating that
> out to a separate function so that hot paths which don't care about
> synchronous behavior don't have to pay the overhead of the full barrier. Let's
> revert.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> Reported-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> Link: http://lkml.kernel.org/r/20200409154413.GK3818@xxxxxxxxxxxxxxxxxxx
> Cc: v4.18+
> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
>
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index 6f87352f8219..41ca996568df 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -33,12 +33,9 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu)
> return;
>
> /*
> - * Paired with the one in cgroup_rstat_cpu_pop_updated(). Either we
> - * see NULL updated_next or they see our updated stat.
> - */
> - smp_mb();
> -
> - /*
> + * Speculative already-on-list test. This may race leading to
> + * temporary inaccuracies, which is fine.
> + *
> * Because @parent's updated_children is terminated with @parent
> * instead of NULL, we can tell whether @cgrp is on the list by
> * testing the next pointer for NULL.
> @@ -134,13 +131,6 @@ static struct cgroup *cgroup_rstat_cpu_pop_updated(struct cgroup *pos,
> *nextp = rstatc->updated_next;
> rstatc->updated_next = NULL;
>
> - /*
> - * Paired with the one in cgroup_rstat_cpu_updated().
> - * Either they see NULL updated_next or we see their
> - * updated stat.
> - */
> - smp_mb();
> -
> return pos;
> }
> -------------------------------------------------------------------------------
>
>
> Git bisection log:
>
> -------------------------------------------------------------------------------
> git bisect start
> # good: [7111951b8d4973bda27ff663f2cf18b663d15b48] Linux 5.6
> git bisect good 7111951b8d4973bda27ff663f2cf18b663d15b48
> # bad: [61aba373f5708f2aebc3f72078e51949a068aa6f] Linux 5.6.19
> git bisect bad 61aba373f5708f2aebc3f72078e51949a068aa6f
> # good: [0bc4d081e87ef8fd43baeed7141267e558896f86] RDMA/cm: Fix an error check in cm_alloc_id_priv()
> git bisect good 0bc4d081e87ef8fd43baeed7141267e558896f86
> # good: [0deceb85de592b89f6476e2bfcb5321a95d23126] dmaengine: owl: Use correct lock in owl_dma_get_pchan()
> git bisect good 0deceb85de592b89f6476e2bfcb5321a95d23126
> # bad: [566f16e2d62fb3817529a05147ef46046ecffe6a] io_uring: initialize ctx->sqo_wait earlier
> git bisect bad 566f16e2d62fb3817529a05147ef46046ecffe6a
> # good: [23952755bc2f839844b5a60c29620cce98c9b09d] riscv: stacktrace: Fix undefined reference to `walk_stackframe'
> git bisect good 23952755bc2f839844b5a60c29620cce98c9b09d
> # good: [be31933db49c8e3d81b96ee6bc9933d2adafe13d] drm/amd/display: Fix potential integer wraparound resulting in a hang
> git bisect good be31933db49c8e3d81b96ee6bc9933d2adafe13d
> # good: [ee899046f79db390650e546446a7a0196788d39d] esp6: get the right proto for transport mode in esp6_gso_encap
> git bisect good ee899046f79db390650e546446a7a0196788d39d
> # good: [bdd07c33a3be1896ec1982fab1a2baf986fbe4dc] netfilter: conntrack: comparison of unsigned in cthelper confirmation
> git bisect good bdd07c33a3be1896ec1982fab1a2baf986fbe4dc
> # bad: [46f4a75fb5034c8d80a5388fe3bb46a8ed05df5b] ARC: [plat-eznps]: Restrict to CONFIG_ISA_ARCOMPACT
> git bisect bad 46f4a75fb5034c8d80a5388fe3bb46a8ed05df5b
> # good: [960a4cc3ec49f8292d0f837f0a6b28b03c54f042] Linux 5.6.16
> git bisect good 960a4cc3ec49f8292d0f837f0a6b28b03c54f042
> # bad: [9cf5d5444c78c14bb9f90dd21cef361ee14ba64b] Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
> git bisect bad 9cf5d5444c78c14bb9f90dd21cef361ee14ba64b
> # good: [2df9b670ad1a2120f54982e1b1dfd17a6ce3cc0c] x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"
> git bisect good 2df9b670ad1a2120f54982e1b1dfd17a6ce3cc0c
> # first bad commit: [9cf5d5444c78c14bb9f90dd21cef361ee14ba64b] Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
> -------------------------------------------------------------------------------
>