Re: PROBLEM: uvesafb broken as of Linux 2.6.24.x

From: Andrew Morton
Date: Thu Jul 10 2008 - 22:08:13 EST


(cc linux-fbdev-devel)

On Mon, 07 Jul 2008 19:17:29 +0200 Mihai Moldovan <ionic@xxxxxxxx> wrote:

> Hello,
>
> I see a weird problem with uvesafb and any recent Kernel. It seems like
> the problem was introduced in some higher 2.6.24 version. I have more
> information regarding this, but I will first explain the problem(s) I
> experience.
>
> After booting a faulty Kernel, these messages appear in my Kernel log
> ring buffer ("dmesg"):
>
>
> [ 112.816609] uvesafb: mode switch failed (eax=0x2104, err=0). Trying
> again with default timings.
> [ 112.819540] uvesafb: mode switch failed (eax=0x2104, err=0)
>
> Please note, that these messages are the first ones after having booted
> the box. (Due to the init scripts, the VT was automatically switched to
> VT7 where X resides, after that I switched back to VT1.)
>
> Switching to other VT's does *not* reproduce the warning/error messages.
>
> Now to the interesting part.
>
> When starting any program that needs framebuffer support (which is why
> we use uvesafb, isn't it?), there messages re-appear. I have tested
> mplayer with -vo fbdev or fbdev2 for example, on VT2. Starting it,
> playing a (video) file for some seconds and looking at dmesg again,
> these are the results:
>
> [ 564.757398] uvesafb: mode switch failed (eax=0x338, err=0). Trying
> again with default timings.
> [ 564.758358] uvesafb: mode switch failed (eax=0x2104, err=0)
> [ 564.838390] uvesafb: mode switch failed (eax=0x344, err=0). Trying
> again with default timings.
> [ 564.844749] uvesafb: mode switch failed (eax=0x2104, err=0)
> [ 564.929364] uvesafb: mode switch failed (eax=0x104c, err=0). Trying
> again with default timings.
> [ 564.937509] uvesafb: mode switch failed (eax=0x2105, err=0)
> [ 565.021358] uvesafb: mode switch failed (eax=0x42b, err=0). Trying
> again with default timings.
> [ 565.027047] uvesafb: mode switch failed (eax=0x2105, err=0)
> [ 565.109331] uvesafb: mode switch failed (eax=0x32b, err=0). Trying
> again with default timings.
> [ 565.111679] uvesafb: mode switch failed (eax=0x2105, err=0)
> [ 565.194323] uvesafb: mode switch failed (eax=0x2104, err=0). Trying
> again with default timings.
> [ 565.195379] uvesafb: mode switch failed (eax=0x2104, err=0)
> [ 565.278306] uvesafb: mode switch failed (eax=0x2104, err=0). Trying
> again with default timings.
> [ 565.280417] uvesafb: mode switch failed (eax=0x2104, err=0)
> [ 571.548365] uvesafb: mode switch failed (eax=0x2104, err=0). Trying
> again with default timings.
> [ 571.555713] uvesafb: mode switch failed (eax=0x10032b, err=0)
>
> Additionally, the console does not work anymore and is totally
> blank/black (and I did not even see a video. However, this last point is
> not a "symptom" one can experience anytime, the video playback might or
> might not work, it is indeed some sort of luck.)
> "Recovering" from this situation is a little bit complicated. I have
> found following solutions:
>
> - Switch to the first VT (or any other, but it seems to be important,
> that this VT has not been used in the means of framebuffer) and then to
> the "old" VT again. Doing so you might get eventually any text again,
> but again, it is a piece of luck. Especially on high CPU and IO load
> this might not work and leave all your consoles blank. Also, you *must
> not* move too quick from one console to another or the problem might not
> disappear as well. However, I have spent several minutes doing this
> method and it just... s*cks.
> - Switch to the VT where X is running (this is working almost every
> time, for details see below) and after that to your desired "old" VT.
> This method has higher success chances than the other one, but depending
> on the load of the box, you really might need several minutes to get any
> text again.
> - It happened now and then to me, that I was not able to switch back
> to the X-VT or any other. The box was still running, no Kernel Panic or
> Ooopses happened, but there was no way to get it back to work (on any
> VT, including the one with Xorg.) Even restarting Xorg did not help
> anymore and the last and only measure to take was rebooting the box.
>
> Okay, that is the situation when using any framebuffer content.
>
> But also without framebuffer usage, the "blank console" problem can hit
> you and you have to do one of the steps listed above in order of being
> able to use the box again graphically. (Not mentioning SSH and the like,
> those work without any problems, of course.)
>
> I cannot stress this too much, please keep in mind, that all the
> problems aggravate on high load. I think this is important, you will now
> see why.
>
>
> I have got a copy of Linus' Linux-git tree and ran the bisect routine. I
> knew that the problem was introduced between 2.6.24.2 and 2.6.25, so I
> build and tested like 13 different kernels in this range.
> Finally, I have been able to find the faulty patch... and was quite
> astonished. This is git's result:
>
> 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
> commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Date: Fri Jan 25 21:08:29 2008 +0100
>
> sched: high-res preemption tick
>
> Use HR-timers (when available) to deliver an accurate preemption tick.
>
> The regular scheduler tick that runs at 1/HZ can be too coarse when nice
> level are used. The fairness system will still keep the cpu
> utilisation 'fair'
> by then delaying the task that got an excessive amount of CPU time
> but try to
> minimize this by delivering preemption points spot-on.
>
> The average frequency of this extra interrupt is sched_latency /
> nr_latency.
> Which need not be higher than 1/HZ, its just that the distribution
> within the
> sched_latency period is important.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
>
> :040000 040000 ab225228500f7a19d5ad20ca12ca3fc8ff5f5ad1
> f1742e1d225a72aecea9d6961ed989b5943d31d8 M arch
> :040000 040000 25d85e4ef7a71b0cc76801a2526ebeb4dce180fe
> ae61510186b4fad708ef0211ac169decba16d4e5 M include
> :040000 040000 9247cec7dd506c648ac027c17e5a07145aa41b26
> 950832cc1dc4d30923f593ecec883a06b45d62e9 M kernel
>
> Do you see, what I mean? Obviously it is no bug in uvesafb itself (at
> least no uvesafb code has been changed, that is) but introduced by this
> Preemption patch. This might explain the problems concentrating on high
> load (but not only in this status, though.)
>
> Now, to be honest, I am a little bit puzzled about whom to contact. It
> might be a bug in uvesafb and I should have contacted Michal Januszewski
> ("spock") directly, because he is the original writer of uvesafb. By the
> way - he is not listed in the MAINTAINERS file - is this driver
> currently not maintained by anyone?
> On the other hand, my problem has been introduced by this somewhat lower
> level HR timer patch, so maybe Peter would have been the right person to
> hit on.
>
> I have decided to let you decide however. :P
>
>
> Here is some other information which could be useful:
>
> [ 0.292261] uvesafb: NVIDIA Corporation, NV34 Board - p164-2n , Chip
> Rev , OEM: NVIDIA, VBE v3.0
> [ 0.301472] uvesafb: protected mode interface info at c000:e340
> [ 0.301544] uvesafb: pmi: set display start = c00ce376, set palette =
> c00ce3e0
> [ 0.301641] uvesafb: pmi: ports = 3b4 3b5 3ba 3c0 3c1 3c4 3c5 3c6 3c7
> 3c8 3c9 3cc 3ce 3cf 3d0 3d1 3d2 3d3 3d4 3d5 3da
> [ 0.304337] uvesafb: VBIOS/hardware supports DDC2 transfers
> [ 0.344795] Display is GTF capable
> [ 0.344895] uvesafb: monitor limits: vf = 200 Hz, hf = 132 kHz, clk =
> 350 MHz
> [ 0.345249] uvesafb: scrolling: ywrap using protected mode interface,
> yres_virtual=4915
> [ 0.744920] Switched to high resolution mode on CPU 0
> [ 0.847204] Console: switching to colour frame buffer device 160x64
> [ 0.893878] uvesafb: framebuffer at 0xd0000000, mapped to 0xf8880000,
> using 24576k, total 262144k
> [ 0.894386] fb0: VESA VGA frame buffer device
>
> The first bad Kernel version I have in use is:
>
> Linux version 2.6.24-OSS4-GIT-Regress-Test-g8f4d37ec-dirty (root@deff)
> (gcc version 4.1.2 20070214 ( (gdc 0.24, using dmd 1.020)) (Gentoo 4.1.2
> p1.0.2)) #2 PREEMPT Sat Jul 5 10:42:18 CEST 2008
>
> I have applied a custom patch as well - BadRAM. But I think this ought
> not interfere with uvesafb.
>
> Relevant sections of my config file are:
>
> CONFIG_PREEMPT_NOTIFIERS=y
> # CONFIG_PREEMPT_RCU is not set
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
> # CONFIG_DEBUG_PREEMPT is not set
> CONFIG_FB_UVESA=y
> CONFIG_SCHED_HRTICK=y
> CONFIG_NO_HZ=y
> # CONFIG_HZ_100 is not set
> # CONFIG_HZ_250 is not set
> # CONFIG_HZ_300 is not set
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
> CONFIG_HIGH_RES_TIMERS=y
>
> If you need any other information, please to *not* hesitate to ask. The
> information I have provided now are only those I thought they could be
> usable.
>
>
> Also, I want to ask any other uvesafb user to test this and confirm the
> bug (if it can be confirmed, of course...)
>
> I have also tested the newest RC kernel (2.6.26-rc9) which faces the
> same problems.
>
>
>
> I hope this was all correctly and I have not broken any rule or missed
> anything.
>
>
> At the last thing, I want to personally thank Linus and all the other
> Kernel Hackers for the so far good work. Keep going! :)
>
>
> Have a nice afternoon (in Europe),
>
>
> Best regards,
>
>
>
> Mihai "Ionic" Moldovan
>
>
>
>
>
>
> P.S.: what is the status about BadRAM? Will it get into Mainline soon?
> AFAIK it is pending since Feb 08 and I would really like to see it
> included. :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/