Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

From: Max Kellermann
Date: Mon Oct 20 2008 - 02:53:15 EST


On 2008/10/17 16:33, Glauber Costa <glommer@xxxxxxxxxx> wrote:
> That's probably something related to apic congestion.
> Does the problem go away if the only thing you change is this:
>
>
> > @@ -891,11 +897,6 @@ do_rest:
> > store_NMI_vector(&nmi_high, &nmi_low);
> >
> > smpboot_setup_warm_reset_vector(start_ip);
> > - /*
> > - * Be paranoid about clearing APIC errors.
> > - */
> > - apic_write(APIC_ESR, 0);
> > - apic_read(APIC_ESR);
> > }
>
>
> Please let me know.

Hello Glauber,

I have rebooted the server with 2.6.27.1 + this patchlet an hour ago.
No problems since.

Hardware: Compaq P4 Xeon server, Broadcom CMIC-WS / CIOB-X2 board.
Tell me if you need more detailed information.


On 2008/10/20 08:27, Ian Campbell <ijc@xxxxxxxxxxxxxx> wrote:
> The issue I see still occurs well before those changesets. I have
> seen it with v2.6.25 but v2.6.24 survived for 7 days without issue
> (my threshold for a good kernel is 7 days, hence bisecting is a bit
> slow...).

Hello Ian,

it seems we're hunting down different bugs after all. Too bad, I
hoped I could have solved your problem, too. Our machine has been
running well over the weekend with the patch I posted; with faulty
kernels, the problem would occur after a few minutes.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/