Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?

From: David Brownell
Date: Wed Mar 10 2004 - 11:27:53 EST


I'll send out a revised patch later, thanks! It's also good
this code got a more careful read. It seems like some things
are not as obvious as I might like...

That patch will merge those list corruption fixes I sent, the
"else" you verified was needed (ugh!!!), and some of what you
include here. It won't add new BUG_ON calls (WARN at worst)
or a duplicate ED state (see next).


>> ... add a new state
>> ED_DESCHEDULED, which is treated exactly like ED_IDLE, except
>> that in this state, the HC may still be referring to the ED in
>> question. Thus, if

David.B> Sounds exactly like ED_UNLINK -- except maybe that it's not
David.B> been put onto ed_rm_list (with ED_DEQUEUE set). Why add
David.B> another state?
...

The current OHCI relies on the internals of the dma_pool()
implementation. If the implementation changed to, say, modify the
memory that is free or, heaven forbid, return the memory to the
kernel, you'd get _extremely_ difficult to track down race conditions.

The current implementation _does_ poison memory on free, if
slab poisoning is enabled. (That's why I asked if you were
using it.)

And that's been quite handy for reporting list corruption bugs,
from races or otherwise. But those list corruption bugs hit a
blind spot in that code: it doesn't check for modify-after-free.
Which is why those bugs were able to hide for so long!


It'd be good if you said _how_ you think it relies on such
internals. Some of your debug diagnostics wrongly claimed
allocation of "new" EDs when they were just being re-used.
That'd make intentional/safe re-use look like a bug or race.


Even so, the code isn't race-free, like I explained yesterday:

- ed_alloc() clears the ED to 0 via memset()
- the order in which memset() clears memory is undefined (various
from platform to platform etc)

There's a wmb() before any ED is handed off to the OHCI silicon;
that forces a defined order. Top of ed_schedule(). First use,
or Nth re-use; no matter.

- thus you might get a case where hwTailP is 0 but hwHeadP
is non-zero, which would cause the HC to happily start
dereferencing the descriptor

If you assume a bug where the ED is freed but still in use,
that's hardly the only thing that'd go wrong!! You can't
use such a potential bug to prove something else is broken.

- Dave



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/