Re: Deadlock in do_page_fault() on ARM (old kernel)

From: Russell King - ARM Linux
Date: Fri Jan 17 2014 - 20:21:11 EST

Next message: H. Peter Anvin: "Re: [PATCH] x86, CPU, AMD: Add workaround for family 16h, erratum793"
Previous message: Li Zefan: "Re: [PATCH 1/6] cgroup: make CONFIG_NET_CLS_CGROUP and CONFIG_NETPRIO_CGROUPbool instead of tristate"
In reply to: Alan Ott: "Re: Deadlock in do_page_fault() on ARM (old kernel)"
Next in thread: Alan Ott: "Re: Deadlock in do_page_fault() on ARM (old kernel)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jan 17, 2014 at 07:57:16PM -0500, Alan Ott wrote:
> On 01/17/2014 08:46 AM, Russell King - ARM Linux wrote:
>> My suspicion therefore is that some other thread must have died while
>> holding the mmap_sem, so there's probably a kernel oops earlier...
>> that's my best guess at the moment without seeing the full backtrace.
>
> There's no oops that I'm able to see.
>
> Each of the tasks which lockdep reports as "holding" mmap_sem are
> blocking for it. If some other task had taken it and then crashed, I
> assume lockdep would list the crashed task as also holding the resource
> in the printout.

My point is this:

- the five (or six) threads which are trying to take the mmap_sem in
read-mode in the fault handler are all blocked on it - they haven't
taken the lock, which will only happen because there's a pending writer.
- of these in your original post, there are two which faulted from
__copy_to_user_std(). __copy_to_user_std() doesn't take the mmap_sem -
this is the non-uaccess-with-memcpy path.
- the pending writers are the two threads in sys_mmap_pgoff(), both of
which are blocked waiting to gain the write lock.
- there are no *other* threads holding the mmap_sem lock.

So... there's a question here how we got into this state - and frankly
I don't know. What I do see from your latest dump is that there's two
unknown modules there - something called rcu2m and another called
buttoms, and there are two threads inside ioctls there. Both have
faulted from the function at 0xc0d2a394 (which won't appear in the
backtrace, but is most likely __copy_to_user_std.)

So, in the absence of you saying anything about there being any preceding
oopses, my conclusion now is that one of those modules is taking the
mmap_sem itself, and is the culpret inducing this deadlock.

Note that your dump ([2]) in your reply was just the hung task detector
printing out the stacktrace for a few tasks, not the full all-threads
stack dump which I was expecting.

So I'm pulling out these conclusions from the very little information
you're supplying.

--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: H. Peter Anvin: "Re: [PATCH] x86, CPU, AMD: Add workaround for family 16h, erratum793"
Previous message: Li Zefan: "Re: [PATCH 1/6] cgroup: make CONFIG_NET_CLS_CGROUP and CONFIG_NETPRIO_CGROUPbool instead of tristate"
In reply to: Alan Ott: "Re: Deadlock in do_page_fault() on ARM (old kernel)"
Next in thread: Alan Ott: "Re: Deadlock in do_page_fault() on ARM (old kernel)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]