Re: Crash in msm serial on dragonboard with ftrace bootargs

From: Sai Prakash Ranjan
Date: Wed Oct 17 2018 - 07:54:07 EST


On 10/17/2018 4:39 AM, Joel Fernandes wrote:
On Tue, Oct 16, 2018 at 05:08:25PM +0530, Sai Prakash Ranjan wrote:
Hi,

On dragonboard 410c, with "ftrace=function" boot args, the console output
slows down and board resets without any backtrace as below. This is tested
on latest kernel and seems to exist even in older kernels as well.

[ 2.949164] EINJ: ACPI disabled.
[ 3.133001] Serial: 8250/16550 dri
Format: Log Type - Time(microsec) - Message - Optional Info
Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic

But with pstore enabled, able to get the below backtrace:
<4>[ 2.949164] EINJ: ACPI disabled.
<6>[ 3.133001] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
<6>[ 3.164097] SuperH (H)SCI(F) driver initialized
<0>[ 3.164471] Internal error: synchronous external abort: 96000010 [#1]
PREEMPT SMP
<4>[ 3.164479] Modules linked in:
<4>[ 3.164495] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
4.19.0-rc8-00008-ge033b9909fff-dirty #175
<4>[ 3.164501] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC
(DT)
<4>[ 3.164508] pstate: 40000085 (nZcv daIf -PAN -UAO)
<4>[ 3.164514] pc : msm_read.isra.2+0x20/0x50
<4>[ 3.164520] lr : msm_read.isra.2+0x1c/0x50
<4>[ 3.164526] sp : ffff000008033a50
<4>[ 3.164531] x29: ffff000008033a50 x28: ffff000009486018
<4>[ 3.164548] x27: 0000000000000001 x26: ffff7dfffe7ff070
<4>[ 3.164565] x25: 0000000000000034 x24: ffff000009486000
<4>[ 3.164582] x23: 0000000000000000 x22: ffff00000978e190
<4>[ 3.164599] x21: ffff0000095e8228 x20: 0000000000000034
<4>[ 3.164616] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
<4>[ 3.164632] x17: 0000000000000000 x16: 0000000000000000
<4>[ 3.164649] x15: ffff0000094a96c8 x14: ffff00008978e6bf
<4>[ 3.164666] x13: ffff00000978e6cd x12: 0000000000000038
<4>[ 3.164683] x11: ffff0000094c6000 x10: 0000000000000c24
<4>[ 3.164699] x9 : ffff80003c89b400 x8 : ffff000008033970
<4>[ 3.164716] x7 : ffff80000eb04100 x6 : 00000000000af304
<4>[ 3.164732] x5 : 0000000000000c40 x4 : ffff80003c06f000
<4>[ 3.164750] x3 : ffff80003c89b498 x2 : 0000000000000000
<4>[ 3.164766] x1 : ffff80003ca68000 x0 : 0000000000000800
<0>[ 3.164785] Process swapper/0 (pid: 1, stack limit =
0x(____ptrval____))
<4>[ 3.164791] Call trace:
<4>[ 3.164797] msm_read.isra.2+0x20/0x50
<4>[ 3.164804] msm_reset_dm_count+0x44/0x80
<4>[ 3.164810] __msm_console_write+0x1c8/0x1d0
<4>[ 3.164816] msm_serial_early_write_dm+0x3c/0x50
<4>[ 3.164823] console_unlock.part.6+0x468/0x528
<4>[ 3.164829] vprintk_emit+0x210/0x218
<4>[ 3.164835] vprintk_default+0x48/0x58
<4>[ 3.164841] vprintk_func+0xf0/0x1c0
<4>[ 3.164847] printk+0x74/0x94
<4>[ 3.164853] sci_init+0x24/0x3c
<4>[ 3.164859] do_one_initcall+0x54/0x248
<4>[ 3.164866] kernel_init_freeable+0x210/0x378
<4>[ 3.164872] kernel_init+0x18/0x118
<4>[ 3.164878] ret_from_fork+0x10/0x1c
<0>[ 3.164884] Code: aa1e03e0 8b214273 97e616f7 d503201f (b9400260)

Seems to be issue with the msm serial driver and not ftrace.
Could someone look into it.

One more thing is for pstore dmesg-ramoops, I had to change late_initcall to
postcore_initcall which brings the question as to why we changed to
late_initcall?
Simple git blame shows to support crypto compress api, but is it really
helpful? A lot of boottime issues can be caught with pstore enabled at
postcore_initcall rather than late_initcall, this backtrace
is just one example. Is there any way we could change this?

Any chance you are able to dig deeper into the stack trace? I would
disassemble vmlinux and see which part of msm_read is generating the
synchronous external abort and look into that.


Yes I had checked the part of msm_read which was generating the abort and it always seems to be in "msm_wait_for_xmitr" at below pointed location.

static inline void msm_wait_for_xmitr(struct uart_port *port)
{
while (!(msm_read(port, UART_SR) & UART_SR_TX_EMPTY)) { <---
if (msm_read(port, UART_ISR) & UART_ISR_TX_READY)
break;
udelay(1);
}
msm_write(port, UART_CR_CMD_RESET_TX_READY, UART_CR);
}

Also I could confirm that this path is entered repeatedly(with tracing register reads/writes from my patch series in https://lore.kernel.org/patchwork/cover/983795/ and tp_printk) and crash is seen at some random time, so could not get much from this.

Also similar to what Steve suggested, I wonder if it boots for you if you
annotate all the functions in the serial driver with 'notrace'.


I have tried this too, but still the target crashes. So I am doubtful if this is ftrace issue? Maybe earlycon uart issue as Stephen is suspecting.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation