Re: [PATCH] net/cdc_ncm: fix null pointer panic at usbnet_link_change

From: BjÃrn Mork
Date: Thu Oct 31 2013 - 05:03:39 EST


"Du, ChangbinX" <changbinx.du@xxxxxxxxx> writes:

>> From: BjÃrn Mork [mailto:bjorn@xxxxxxx]
>> Sent: Tuesday, October 29, 2013 4:41 PM
>> To: Du, ChangbinX
>> Cc: oliver@xxxxxxxxxx; linux-usb@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx;
>> linux-kernel@xxxxxxxxxxxxxxx
>> Subject: Re: [PATCH] net/cdc_ncm: fix null pointer panic at usbnet_link_change
>>
>> "Du, ChangbinX" <changbinx.du@xxxxxxxxx> writes:
>>
>> > From: "Du, Changbin" <changbinx.du@xxxxxxxxx>
>> >
>> > In cdc_ncm_bind() function, it call cdc_ncm_bind_common() to setup usb.
>> > But cdc_ncm_bind_common() may meet error and cause usbnet_disconnect()
>> > be called which calls free_netdev(net).
>>
>> I am sure you are right, but I really don't see how that can happen.
>>
>> AFAICS, we ensure that the intfdata is set to NULL before calling
>> usb_driver_release_interface() in the error cleanup in
>> cdc_ncm_bind_common():
>>
>>
>> error2:
>> usb_set_intfdata(ctx->control, NULL);
>> usb_set_intfdata(ctx->data, NULL);
>> if (ctx->data != ctx->control)
>> usb_driver_release_interface(driver, ctx->data);
>> error:
>> cdc_ncm_free((struct cdc_ncm_ctx *)dev->data[0]);
>> dev->data[0] = 0;
>> dev_info(&dev->udev->dev, "bind() failure\n");
>> return -ENODEV;
>>
>>
>> Thus we hit the test in disconnect, and usbnet_disconnect() is never
>> called:
>>
>> static void cdc_ncm_disconnect(struct usb_interface *intf)
>> {
>> struct usbnet *dev = usb_get_intfdata(intf);
>>
>> if (dev == NULL)
>> return; /* already disconnected */
>>
>> usbnet_disconnect(intf);
>> }
>>
>> So we should really be OK, but we are not???? I would appreciate it if
>> anyone took the time to feed me why, with a very small spoon...
>>
>
> Yes, you are right. It should not happen if cdc_ncm_disconnect is not
> called. But this really happened. It produced on kernel 3.10.

Yes, I do not doubt it. I just don't understand it. There is no
contradiction there. I believe lots of stuff I don't understand :-)

The version this happened is very useful, although I don't think there
are any relevant changes after v3.10.

> Here is the full panic message for you to refer. I will get a method try to
> reproduce it.

That would be great if you were able to. Otherwise it will be difficult
to verify that the proposed fix works.

In any case, as I said: I believe a fix which avoids the call to
usbnet_link_change() in case of bind failure is correct and
appropriate. But I suggest that you do it by removing the call and
comment from cdc_ncm_bind() and instead set the FLAG_LINK_INTR in the
driver_info. That's how this should have been implemented from the
beginning.

> [ 15.635727] BUG: Bad page state in process khubd pfn:31994
> [ 15.641989] page:f3ad9280 count:0 mapcount:0 mapping:00000020 index:0x0
> [ 15.649384] page flags: 0x40000000()
> [ 15.670096] BUG: unable to handle kernel NULL pointer dereference at 00000078
> [ 15.678078] IP: [<c24b7f5e>] usbnet_link_change+0x1e/0x80
> [ 15.684120] *pde = 00000000
> [ 15.687339] Oops: 0000 [#1] SMP
> [ 15.690953] Modules linked in: atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core bcm_bt_lpm bcm432)
> [ 15.702658] CPU: 0 PID: 573 Comm: khubd Tainted: G B W O 3.10.1+ #1
> [ 15.710241] task: f27e8f30 ti: f2084000 task.ti: f2084000
> [ 15.716270] EIP: 0060:[<c24b7f5e>] EFLAGS: 00010246 CPU: 0
> [ 15.722396] EIP is at usbnet_link_change+0x1e/0x80
> [ 15.727747] EAX: f18524c0 EBX: 00000000 ECX: f18524c0 EDX: 00000000
> [ 15.734746] ESI: f18524c0 EDI: 00000000 EBP: f2085b7c ESP: f2085b70
> [ 15.741746] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 15.747775] CR0: 8005003b CR2: 00000078 CR3: 02c3b000 CR4: 001007d0
> [ 15.754774] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 15.761774] DR6: ffff0ff0 DR7: 00000400
> [ 15.766054] Stack:
> [ 15.768295] 00000000 f18524c0 c2a03818 f2085b8c c24bc69a f1852000 f1f52e00 f2085be0
> [ 15.776991] c24b8d32 00000000 00000001 00000000 c2167f2c f2085bb4 c2821253 f2085bec
> [ 15.785687] 00000246 00000246 f18d8088 f1f52e1c f1fce464 f1fce400 f18524c0 c28a06e0
> [ 15.794376] Call Trace:
> [ 15.797109] [<c24bc69a>] cdc_ncm_bind+0x3a/0x50
> [ 15.802267] [<c24b8d32>] usbnet_probe+0x282/0x7d0
> [ 15.807621] [<c2167f2c>] ? sysfs_new_dirent+0x6c/0x100
> [ 15.813460] [<c2821253>] ? mutex_lock+0x13/0x40
> [ 15.818618] [<c24bb278>] cdc_ncm_probe+0x8/0x10
> [ 15.823777] [<c24e0ef7>] usb_probe_interface+0x187/0x2c0

I still do not understand how this happens, but usbnet_link_change will
schedule some delayed work which absolutely is not appropriate if
usbnet_probe fails. There are no cancel_work calls in the usbnet_probe
exit path....

So the call should definitely be avoided if bind fails.


BjÃrn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/