Re: kerneloops.org report for the week

From: Ingo Molnar
Date: Sun Jun 28 2009 - 23:19:00 EST



* Arjan van de Ven <arjan@xxxxxxxxxxxxx> wrote:

> Few "highlights" this week
> * mem_cgroup_add_lru_list (rank 2) is a high rising issue;
> it's list corruption, question is why this is new
> * rank 13 (memcmp in the raid code) is also new
> * the warning in get_free_pages that has been discussed on lkml is dropping
> from the ranks again
>
>
> This week, a total of 15273 oopses and warnings have been reported,
> compared to 13384 reports in the previous week.
>
>
> Rank 2: mem_cgroup_add_lru_list (warn)
> Reported 1554 times (1622 total reports)
> List corruption in the VM code
> This oops was last seen in version 2.6.30-git19, and first seen in 2.6.29.
> More info: http://www.kerneloops.org/searchweek.php?search=mem_cgroup_add_lru_list

At least one list corruption bug was fixed by:

cb4cbcf: mm: fix incorrect page removal from LRU

> Rank 3: getnstimeofday (warning)
> Reported 1319 times (4893 total reports)
> [suspend resume] getnstimeofday() is called before timekeeping is resumed
> This oops was last seen in version 2.6.30, and first seen in 2.6.24.
> More info: http://www.kerneloops.org/searchweek.php?search=getnstimeofday

Probably caused by some buggy driver callback?

> Rank 7: hres_timers_resume (warning)
> Reported 763 times (2368 total reports)
> [suspend resume] hres_timers_resume() is incorrectly called with interrupts on
> This warning was last seen in version 2.6.30, and first seen in 2.6.24.7.
> More info: http://www.kerneloops.org/searchweek.php?search=hres_timers_resume

This is probably a driver incorrectly enabling irqs in a resume
callback. This should be easier and more specific to debug with the
lockdep based annotation i suggested for the suspend code in various
`mails.

> Rank 8: generic_get_mtrr (warning)
> Reported 544 times (2061 total reports)
> BIOS bug where the MTRRs are not set up correctly
> This warning was last seen in version 2.6.30, and first seen in 2.6.25.3.
> More info: http://www.kerneloops.org/searchweek.php?search=generic_get_mtrr

I think this calls for enabling the x86 MTRR sanitizer by default -
500 out of 15000 reports suggests a significant proportion of Linux
systems is affected by MTRR setup problems.

I.e. we should change:

config MTRR_SANITIZER_ENABLE_DEFAULT
int "MTRR cleanup enable value (0-1)"
range 0 1
default "0"

To 'default "1"'. Any objections?

If the MTRR sanitizer is enabled then i think the above warning in
generic_get_mtrr() should never trigger.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/