Re: [PATCH 4.19 000/139] 4.19.7-stable review

From: Yuchung Cheng
Date: Wed Dec 05 2018 - 11:04:08 EST


On Wed, Dec 5, 2018 at 4:08 AM Rafael David Tinoco
<rafael.tinoco@xxxxxxxxxx> wrote:
>
> On 12/5/18 4:58 AM, Greg Kroah-Hartman wrote:
> > On Tue, Dec 04, 2018 at 07:09:46PM -0200, Rafael David Tinoco wrote:
> >> On 12/4/18 8:48 AM, Greg Kroah-Hartman wrote:
> >>> This is the start of the stable review cycle for the 4.19.7 release.
> >>> There are 139 patches in this series, all will be posted as a response
> >>> to this one. If anyone has any issues with these being applied, please
> >>> let me know.
> >>>
> >>> Responses should be made by Thu Dec 6 10:36:22 UTC 2018.
> >>> Anything received after that time might be too late.
> >>>
> >>> The whole patch series can be found in one patch at:
> >>> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.7-rc1.gz
> >>> or in the git tree and branch at:
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> >>> and the diffstat can be found below.
> >>>
> >>> thanks,
> >>>
> >>> greg k-h
> >>
> >> During functional tests for this v4.19 release, we faced a PANIC,
> >> described bellow, but unlikely related to this specific v4.19 version.
> >>
> >> First a WARN() at tcp_output.c:
> >>
> >> tcp_send_loss_probe():
> >> ...
> >> /* Retransmit last segment. */
> >> if (WARN_ON(!skb))
> >> goto rearm_timer;
> >> ...
> >>
> >> [ 173.557528] WARNING: CPU: 1 PID: 0 at
> >> /srv/oe/build/tmp-rpb-glibc/work-shared/juno/kernel-source/net/ipv4/tcp_output.c:2485
> >> tcp_send_loss_probe+0x164/0x1e8
> >> [ 173.571425] Modules linked in: crc32_ce crct10dif_ce fuse
> >> [ 173.576804] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.7-rc1 #1
> >> [ 173.583014] Hardware name: ARM Juno development board (r2) (DT)
> >
> > So only this one machine saw this failure?
> >
> > If you can reproduce it again, bisection would be great to do if
> > possible.
>
> Yes, the only machine... I'm afraid this issue is intermittent and
> depends on TCP Tail Loss and a specific race causing the NULL
> dereference, so bisection would be tricky since it has happened
> independently of the functional test that was running. I have also
> copied authors for the Tail Loss code to check if they got any clues
> even without KASAN data.
There cause is an inconsistency of packet accounting: TCP
retransmission queue is empty but tp->packets_out is not zero. We will
send a fix soon. Thanks.

>
> Thank you,
> -
> Rafael D. Tinoco
>