Re: kill -9 <pid of X>

Jon M. Taylor (taylorj@ecs.csus.edu)
Tue, 11 Aug 1998 18:22:22 -0700 (PDT)


On Tue, 11 Aug 1998, Andrea Arcangeli wrote:

> On Tue, 11 Aug 1998, Tigran Aivazian wrote:
>
> >Hello guys,
> >
> >Yes, it *is* very much a kernel issue. What happens if you kill -9 the X
>
> It' s not a kernel issue.

Unfortunately for us, that's true.

> If the X server run in iopl() and own the video
> card, _it_ must care that everything will return OK when it' s killed.

Yes. But this is not a simple problem, to put it mildly. See
below.

> It' s possible using a X wrapper that run as root (not as user) and wait
> for the child to die (once the X child is died the wrapper can restore the
> text console fine).

Not in all cases. There is no magic way to reset any video card
into a known good state, no matter what state it is currently in. This
can only be done if you know what state the video card is in, and only the
X server knows that. Therefore, the X server *must* clean up the hardware
state before it is killed. And that requirementopens up a whole can of
worms.

> >server? The machine hangs because the kernel does not sanify the
>
> The machine doesn' t hang.

In most cases. Not all. I have locked my machine hard by killing
the X server. It usually doesn't happen, but occasionally it does. In
particular, most video cards do not take well to being interrupted midway
through an accelerator command FIFO fill. Other cards have wierd timing
issues, and still others may be capable of bus mastering and if the card
is not told to relenquish its lock on the bus, the machine will hang. In
general, there are many cases where atomicity must be preserved during
video card programming. XFree86 is not designed to have to worry about
this, and neither is Linux.

This is another illustration of why userspace video drivers can
never be made to work 100% properly and reliably. Just giving iopl()
privs to a task isn't enough. You also have to ensure atomicity of
hardware programming and coherent maintenance of hardware state knowledge.
XFree86 isn't set up for this because it isn't designed to be preempted.
It controls VC switches, which allows it to cleanly hand off control of
the video card to the controller of another VC, be that SVGALib or another
X server or the kernel itself. It can execute the VT switch after card
programming is finished.

But if the X server must respond to a kill -9 in the same manner
as other tasks, it cannot guarantee atomicity and coherent state anymore.
A signal handler would have to be able to finish whatever programming was
necessary to bring the hardware to a known sane state, no matter where the
original programming was interrupted, which is basically impossible.
Nothing but a scheduling lock around the critical sections will do, to
ensure that the signal cannot be recieved until it can safely be handled.
Coding such locks would entail a massive amount of recoding of XFree86's
driver structure to avoid having to throw locks around a lot of code that
wouldn't need them. XFree86 is not designed to have to worry about this
issue.

I don't know how an suid root program would go about doing this (I
am assuming that this is possible in the first place), but let's assume it
is done. Now, when the X server's VC is active, you have it preempting
the scheduler every single time it needs to fire an acceleration. This
loses you the benefit of using hardware acceleration in the first place,
which is that the accel op can run in parallel with the CPU. Also, you'd
better not be trying to run anything timing-critical in the kernel or
userspace, because all those scheduler locks will suck up a lot of CPU
time and may throw off existing kernel timing guarantees. Packet loss,
here we come! Fire up a bunch of X apps which use XAA and watch the rest
of the system come to a grinding halt. Fun fun fun.

You could get around this problem by coding into the kernel a
special process type for the X server. The kernel would either have to
block signals (some or all) from the X server process, or it would have to
thunk the appropriate signals into a fake VT switch notification/kill
signal combination. That last, grotesque hack though it would be, might
be the easiest solution to this problem. I don't know for sure. What I
DO know is that whatever the solution, it involves giving the X server
even more kernel-level powers and responsibilities (special signal
handling or scheduling manipulation) in addition to those it already has
(direct IO port access, raw keyboard access, VT switch handling).

But sooth! Look at fbcon! Critical sections are not a problem
because the drivers are in the kernel and are not tasks, subject to
signals and scheduling. Maintaining coherent state is not a problem,
because of the same reasons. No special permissions are needed, because
the code is a kernel device driver. VT switching coordination is not an
issue, because that is handled elsewhere. Everything just works right
from the get-go, and if it doesn't you have a hell of a lot less code to
wade through to fix the problem. Your X server can be just another task,
too. Funny how **ALL** problems associated with suid root graphics apps
just melt away in the sun.

You _can_ ensure safety with less in the kernel than KGI or even
fbcon has. But you **MUST** have hardware locking, access control and
state maintenance in the kernel, with each device controlled by one driver
and no other code. Nothing else can guarantee safety and consistency of
behavior. In the stacksmashing thread, I claimed that suid root code, if
it must be used, must take responsibility for ensuring that it does not
compromise the stability or security of the system. XFree86 CANNOT DO
THIS BY ITSELF. It doesn't have the needed tools, even with suid root
privs.

Either use XF86_FBDev, do a massive recoding of both XFree86 and
the kernel, or live with the possibility of crashes. There aren't any
other choices.

Jon

---
'Cloning and the reprogramming of DNA is the first serious step in 
becoming one with God.'
	- Scientist G. Richard Seed

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html