Re: drm/mgag200: doesn't work in panic context

From: Daniel Vetter
Date: Sat Jun 27 2015 - 09:53:08 EST


On Fri, Jun 26, 2015 at 8:30 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>>> I'm here to report two panics which hang forever (the machine cannot reboot). It is because mgag200 doesn't work in panic context. It sleeps and allocates memory non-atomically.
>>
>> This is the same for all drm drivers, the drm atomic handling with
>> fbcon/fbdev is totally broken. It would be serious work to fix this
>> properly.
>
> It's a serious problem when a server crashes ... even worse when it hangs while doing so
> because we have to rely on some other agent to notice the hung server and go poke it
> with a stick.
>
> If it is too hard to fix all of the drivers, is it possible to attack this in the allocator?

Hm, what do you mean by fixing this in the allocator? I've made some
rough sketch of the problem space in
http://www.x.org/wiki/DRMJanitors/ under "Make panic handling work".
Problem is that the folks which know what to do (drm hackers) have
zero incentive to fix it (since if you blow up a drm driver any kind
of fbcon panic handling is hopeless anyway).

The other problem is is that this is a serious effort with tons of
little things all over to consider. My gut estimate is that probably
it'll take something of the order of a man year to fix this for real.
David Herrmann has supplied parts of the required puzzle to actually
be able to somewhat reliably show panics on drm modesetting drivers,
but that didn't contain any of the work to make fbdev not totally suck
at panic handling first. And I guess for general distros and servers
that's needed - developers simply disable all of fbdev to be able to
debug kms hangs.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/