Re: [BUG 3.7-rc1] nouveau cli->mutex possible recursive lockingdetected

From: Arend van Spriel
Date: Thu Oct 25 2012 - 05:26:49 EST


On 10/24/2012 02:45 PM, Arend van Spriel wrote:
On 10/24/2012 01:14 PM, Arend van Spriel wrote:
On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
I have this lockdep warning on wireless-testing tree based
on 3.7-rc1 (no other patches except wireless bits).

=============================================
Restarting tasks ... done.
[ INFO: possible recursive locking detected ]
3.7.0-rc1-wl+ #2 Not tainted
---------------------------------------------
Xorg/2269 is trying to acquire lock:
(&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>]
nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]

but task is already holding lock:
(&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>]
nouveau_abi16_get+0x34/0x100 [nouveau]


I have observed the same bug so I built and tested v3.7-rc2 tag with
lockdep enabled. It has the same problem and it results in a failure to
resume after suspend. See below.

Gr. AvS

digging into the trace:


nouveau_gem_ioctl_pushbuf() calls nouveau_abi16_get() which grabs the
mutex. Assume this should protect the chan variable passed to
nouveau_gem_pushbuf_validate(), which does a bit more that validate as
it ends up in nouveau_bo_move_m2mf() which uses the drm->chan. However,
it deadlocks before that.

Gr. AvS

Maybe this helps. The two locations where the lock is grabbed are from the same commit (see below).

Gr. AvS

commit ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69
Author: Ben Skeggs <bskeggs@xxxxxxxxxx>
Date: Fri Jul 20 08:17:34 2012 +1000

drm/nouveau: port all engines to new engine module format

This is a HUGE commit, but it's not nearly as bad as it looks - any problems
can be isolated to a particular chipset and engine combination. It was
simply too difficult to port each one at a time, the compat layers are
*already* ridiculous.

Most of the changes here are simply to the glue, the process for each of the
engine modules was to start with a standard skeleton and copy+paste the old
code into the appropriate places, fixing up variable names etc as needed.

v2: Marcin Slusarz <marcin.slusarz@xxxxxxxxx>
- fix find/replace bug in license header

v3: Ben Skeggs <bskeggs@xxxxxxxxxx>
- bump indirect pushbuf size to 8KiB, 4KiB barely enough for userspace and
left no space for kernel's requirements during GEM pushbuf submission.
- fix duplicate assignments noticed by clang

v4: Marcin Slusarz <marcin.slusarz@xxxxxxxxx>
- add sparse annotations to nv04_fifo_pause/nv04_fifo_start
- use ioread32_native/iowrite32_native for fifo control registers

v5: Ben Skeggs <bskeggs@xxxxxxxxxx>
- rebase on v3.6-rc4, modified to keep copy engine fix intact
- nv10/fence: unmap fence bo before destroying
- fixed fermi regression when using nvidia gr fuc
- fixed typo in supported dma_mask checking

Signed-off-by: Ben Skeggs <bskeggs@xxxxxxxxxx>



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/