Re: [PATCH] drivers/i2c-aspeed: avoid invalid memory reference after timeout

From: Heyi Guo
Date: Fri Jan 14 2022 - 09:01:13 EST


Hi Joel,


在 2022/1/11 下午6:51, Joel Stanley 写道:
On Tue, 11 Jan 2022 at 07:52, Heyi Guo <guoheyi@xxxxxxxxxxxxxxxxx> wrote:
Hi all,

Any comments?

Thanks,

Heyi

在 2022/1/9 下午9:26, Heyi Guo 写道:
The memory will be freed by the caller if transfer timeout occurs,
then it would trigger kernel panic if the peer device responds with
something after timeout and triggers the interrupt handler of aspeed
i2c driver.

Set the msgs pointer to NULL to avoid invalid memory reference after
timeout to fix this potential kernel panic.
Thanks for the patch. How did you discover this issue? Do you have a
test I can run to reproduce the crash?

We are using one i2c channel to communicate with another MCU by implementing user space SSIF protocol, and the MCU may not respond in time if it is busy. If it responds after timeout occurs, it will trigger below kernel panic:

[11844.248998] 8<--- cut here ---
[11844.252419] Unhandled fault: page domain fault (0x81b) at 0x7edb2ff5
[11844.259509] pgd = aa62a338
[11844.262524] [7edb2ff5] *pgd=b5bee835, *pte=b020275f, *ppte=b0202c7f
[11844.269525] Internal error: : 81b [#1] SMP ARM
[11844.274479] Modules linked in: [last unloaded: i2c_dev_sysfs]
[11844.280895] CPU: 0 PID: 256 Comm: ssif-host Tainted: G           O      5.4.39-e0b210f #1
[11844.290016] Hardware name: Generic DT based system
[11844.295367] PC is at aspeed_i2c_master_irq+0xc8/0x498
[11844.301001] LR is at 0x16
[11844.303917] pc : [<80636b4c>]    lr : [<00000016>]    psr: 000f0193
[11844.310905] sp : b5c53ce8  ip : 7edb2fe0  fp : b5c53d04
[11844.316730] r10: b5c52000  r9 : 00000000  r8 : 0000002e
[11844.322554] r7 : 00000001  r6 : 00000004  r5 : 00000004  r4 : b4b91440
[11844.329825] r3 : 00000000  r2 : b642448c  r1 : 00000015  r0 : 00000000
[11844.337107] Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[11844.345163] Control: 10c5387d  Table: b5a9006a  DAC: 00000051
[11844.351572] Process ssif-host (pid: 256, stack limit = 0xbe5df348)
[11844.358466] Stack: (0xb5c53ce8 to 0xb5c54000)
[11844.363325] 3ce0:                   b4b91440 00000004 b4b916bc 00000001 b5c53d2c b5c53d08
[11844.372451] 3d00: 80636f80 80636a90 b4b8d140 80c052e4 b91e8068 00000001 0000002e 00000000
[11844.381577] 3d20: b5c53d6c b5c53d30 8017b13c 80636f28 80839580 b53a8600 b91e8000 b5c53d70
[11844.390702] 3d40: 80c8fcb4 b91e8000 80c052e4 b91e8068 00000001 b900c800 b5c53e28 b648863c
[11844.399828] 3d60: b5c53d8c b5c53d70 8017b348 8017b0f4 00000000 80c04c48 b91e8000 80c052e4
[11844.408954] 3d80: b5c53dac b5c53d90 8017b3fc 8017b318 b91e8000 80c052e4 00000000 00000001
[11844.418079] 3da0: b5c53dc4 b5c53db0 801800f4 8017b3b0 0000002e 80b771c0 b5c53dd4 b5c53dc8
[11844.427205] 3dc0: 8017a1bc 80180040 b5c53dfc b5c53dd8 8017a8d0 8017a198 bf80200c 000003ff
[11844.436330] 3de0: 80c052e4 bf802000 00000000 b5c53e28 b5c53e24 b5c53e00 801022f8 8017a87c
[11844.445456] 3e00: 80634000 200f0013 ffffffff b5c53e5c b6488640 b5c52000 b5c53eac b5c53e28
[11844.454583] 3e20: 80101aac 801022b8 b6488640 38c3f000 00000059 80b7ca58 7edb2974 00000000
[11844.463709] 3e40: b6424480 00000002 b6488640 b6424480 b648863c b5c53eac 0000005d b5c53e78
[11844.472835] 3e60: 00000001 80634000 200f0013 ffffffff 00000051 7f000000 00000018 b53a8600
[11844.481960] 3e80: 00000018 7edb2974 00000000 b53a8600 00000051 b4c5ce40 b5c52000 00000036
[11844.491086] 3ea0: b5c53ee4 b5c53eb0 80634850 80633fc0 b5fd3015 802a7770 7edb2990 00000002
[11844.500212] 3ec0: b4b94778 80c04c48 7edb2974 b4b94778 b4c5ce40 00000007 b5c53f6c b5c53ee8
[11844.509338] 3ee0: 802c6c48 80634528 00000020 b4b94778 b4c5ce48 00000142 b5c53f34 b5c53f08
[11844.518463] 3f00: 802a604c 802a5e2c 00000000 00000000 00000000 00000000 00000000 b5fd3000
[11844.527589] 3f20: b4c5ce40 b5fd3000 b5c53f4c b5c53f38 802c27e0 802a6010 00000007 80c04c48
[11844.536715] 3f40: b5c53f94 b4c5ce40 00000007 00000707 7edb2974 b4c5ce40 b5c52000 00000036
[11844.545841] 3f60: b5c53f94 b5c53f70 802c71dc 802c6870 00000007 00000016 0000000c 00000036
[11844.554967] 3f80: 80101204 b5c52000 b5c53fa4 b5c53f98 802c7220 802c71a4 00000000 b5c53fa8
[11844.564092] 3fa0: 80101000 802c7214 00000007 00000016 00000007 00000707 7edb2974 7edb2990
[11844.573218] 3fc0: 00000007 00000016 0000000c 00000036 7edb2be1 7edb2fe0 00000030 76f436b4
[11844.582344] 3fe0: 0004d110 7edb2964 00033f48 45f2ffac 200f0010 00000007 00000000 00000000
[11844.591464] Backtrace:
[11844.594189] [<80636a84>] (aspeed_i2c_master_irq) from [<80636f80>] (aspeed_i2c_bus_irq+0x64/0x180)
[11844.604187]  r7:00000001 r6:b4b916bc r5:00000004 r4:b4b91440
[11844.610506] [<80636f1c>] (aspeed_i2c_bus_irq) from [<8017b13c>] (__handle_irq_event_percpu+0x54/0x224)
[11844.620893]  r9:00000000 r8:0000002e r7:00000001 r6:b91e8068 r5:80c052e4 r4:b4b8d140
[11844.629537] [<8017b0e8>] (__handle_irq_event_percpu) from [<8017b348>] (handle_irq_event_percpu+0x3c/0x98)
[11844.640311]  r10:b648863c r9:b5c53e28 r8:b900c800 r7:00000001 r6:b91e8068 r5:80c052e4
[11844.649044]  r4:b91e8000
[11844.651861] [<8017b30c>] (handle_irq_event_percpu) from [<8017b3fc>] (handle_irq_event+0x58/0x7c)
[11844.661759]  r5:80c052e4 r4:b91e8000
[11844.665747] [<8017b3a4>] (handle_irq_event) from [<801800f4>] (handle_fasteoi_irq+0xc0/0x180)
[11844.675259]  r7:00000001 r6:00000000 r5:80c052e4 r4:b91e8000
[11844.681574] [<80180034>] (handle_fasteoi_irq) from [<8017a1bc>] (generic_handle_irq+0x30/0x44)
[11844.691181]  r5:80b771c0 r4:0000002e
[11844.695169] [<8017a18c>] (generic_handle_irq) from [<8017a8d0>] (__handle_domain_irq+0x60/0xc4)
[11844.704878] [<8017a870>] (__handle_domain_irq) from [<801022f8>] (gic_handle_irq+0x4c/0x94)
[11844.714198]  r9:b5c53e28 r8:00000000 r7:bf802000 r6:80c052e4 r5:000003ff r4:bf80200c
[11844.722838] [<801022ac>] (gic_handle_irq) from [<80101aac>] (__irq_svc+0x6c/0x90)
[11844.731185] Exception stack(0xb5c53e28 to 0xb5c53e70)
[11844.736819] 3e20:                   b6488640 38c3f000 00000059 80b7ca58 7edb2974 00000000
[11844.745945] 3e40: b6424480 00000002 b6488640 b6424480 b648863c b5c53eac 0000005d b5c53e78
[11844.755068] 3e60: 00000001 80634000 200f0013 ffffffff
[11844.760704]  r9:b5c52000 r8:b6488640 r7:b5c53e5c r6:ffffffff r5:200f0013 r4:80634000
[11844.769347] [<80633fb4>] (i2cdev_ioctl_rdwr) from [<80634850>] (i2cdev_ioctl+0x334/0x3bc)
[11844.778473]  r10:00000036 r9:b5c52000 r8:b4c5ce40 r7:00000051 r6:b53a8600 r5:00000000
[11844.787206]  r4:7edb2974
[11844.790024] [<8063451c>] (i2cdev_ioctl) from [<802c6c48>] (do_vfs_ioctl+0x3e4/0x934)
[11844.798664]  r7:00000007 r6:b4c5ce40 r5:b4b94778 r4:7edb2974
[11844.804979] [<802c6864>] (do_vfs_ioctl) from [<802c71dc>] (ksys_ioctl+0x44/0x70)
[11844.813232]  r10:00000036 r9:b5c52000 r8:b4c5ce40 r7:7edb2974 r6:00000707 r5:00000007
[11844.821966]  r4:b4c5ce40
[11844.824781] [<802c7198>] (ksys_ioctl) from [<802c7220>] (sys_ioctl+0x18/0x1c)
[11844.832745]  r9:b5c52000 r8:80101204 r7:00000036 r6:0000000c r5:00000016 r4:00000007
[11844.841385] [<802c7208>] (sys_ioctl) from [<80101000>] (ret_fast_syscall+0x0/0x54)
[11844.849827] Exception stack(0xb5c53fa8 to 0xb5c53ff0)
[11844.855453] 3fa0:                   00000007 00000016 00000007 00000707 7edb2974 7edb2990
[11844.864578] 3fc0: 00000007 00000016 0000000c 00000036 7edb2be1 7edb2fe0 00000030 76f436b4
[11844.873701] 3fe0: 0004d110 7edb2964 00033f48 45f2ffac
[11844.879337] Code: e1a03423 e281e001 e6ef3073 e584e2a4 (e7cc3001)
[11844.886143] ---[ end trace 9422f39bd8d33cb5 ]---

After applying this patch, we'll get below warning instead:

"bus in unknown state. irq_status: 0x%x\n"

The issue can be reproduced more easily by setting the timeout to some smaller value.


Can you provide a Fixes tag?

I think the bug was introduced by the first commit of this file :(

f327c686d3ba44eda79a2d9e02a6a242e0b75787



Do other i2c master drivers do this? I took a quick look at the meson
driver and it doesn't appear to clear it's pointer to msgs.

It is hard to say. It seems other drivers have some recover scheme like aborting the transfer, or loop each messages in process context and don't do much in IRQ handler, which may disable interrupts or not retain the buffer pointer before returning timeout.

Thanks,

Heyi



Signed-off-by: Heyi Guo <guoheyi@xxxxxxxxxxxxxxxxx>

-------

Cc: Brendan Higgins <brendanhiggins@xxxxxxxxxx>
Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
Cc: Joel Stanley <joel@xxxxxxxxx>
Cc: Andrew Jeffery <andrew@xxxxxxxx>
Cc: Philipp Zabel <p.zabel@xxxxxxxxxxxxxx>
Cc: linux-i2c@xxxxxxxxxxxxxxx
Cc: openbmc@xxxxxxxxxxxxxxxx
Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: linux-aspeed@xxxxxxxxxxxxxxxx
---
drivers/i2c/busses/i2c-aspeed.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
index 67e8b97c0c950..3ab0396168680 100644
--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -708,6 +708,11 @@ static int aspeed_i2c_master_xfer(struct i2c_adapter *adap,
spin_lock_irqsave(&bus->lock, flags);
if (bus->master_state == ASPEED_I2C_MASTER_PENDING)
bus->master_state = ASPEED_I2C_MASTER_INACTIVE;
+ /*
+ * All the buffers may be freed after returning to caller, so
+ * set msgs to NULL to avoid memory reference after freeing.
+ */
+ bus->msgs = NULL;
spin_unlock_irqrestore(&bus->lock, flags);

return -ETIMEDOUT;