Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

From: Thomas Hellstrom
Date: Fri Sep 13 2013 - 01:33:37 EST


On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
Op 12-09-13 18:44, Thomas Hellstrom schreef:
On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
Op 12-09-13 17:36, Daniel Vetter schreef:
On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
So I'm poking around the preemption code and stumbled upon:

drivers/gpu/drm/i915/i915_gem.c: set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c: set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c: set_need_resched();
drivers/gpu/drm/udl/udl_gem.c: set_need_resched();

All these sites basically do:

while (!trylock())
yield();

which is a horrible and broken locking pattern.

Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
task that preempted the lock holder at FIFOn.

Secondly the implementation is worse than usual by abusing
VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
doesn't retry, but you're using it as a get out of fault path. And
you're using set_need_resched() which is not something a driver should
_ever_ touch.

Now I'm going to take away set_need_resched() -- and while you can
'reimplement' it using set_thread_flag() you're not going to do that
because it will be broken due to changes to the preempt code.

So please as to fix ASAP and don't allow anybody to trick you into
merging silly things like that again ;-)
The set_need_resched in i915_gem.c:i915_gem_fault can actually be
removed. It was there to give the error handler a chance to sneak in
and reset the hw/sw tracking when the gpu is dead. That hack goes back
to the days when the locking around our error handler was somewhere
between nonexistent and totally broken, nowadays we keep things from
live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
whip up a patch to rip this out. I'll also check that our testsuite
properly exercises this path (needs a bit of work on a quick look for
better coverage).

The one in ttm is just bonghits to shut up lockdep: ttm can recurse
into it's own pagefault handler and then deadlock, the trylock just
keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
fun userspace did and now have testcases for them. The right solution
to fix this is to use copy_to|from_user_atomic in ttm everywhere it
holds locks and have slowpaths which drops locks, copies stuff into a
temp allocation and then continues. At least that's how we've fixed
all those inversions in i915-gem. I'm not volunteering to fix this ;-)
Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.

Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
CONFIG_PROVE_LOCKING=y

and hard grab that reserve lock within the fault handler, done.. lockdep will spit it out for you :p

~Maarten

Given that all copy_to_user / copy_from_user paths are actually hit during testing, right?

/Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/