Re: v2.6.26-rc7: BUG task_struct: Poison overwritten

From: Vegard Nossum
Date: Sat Jun 21 2008 - 16:41:21 EST


On Sat, Jun 21, 2008 at 9:28 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> Oops, seems there was another one a bit earlier (about 5 minutes) that I
> didn't notice. I don't think it helps that much, but here it is:

I actually got a third one too, but it's similar to the first two.

>
>
> =============================================================================
> BUG task_struct: Poison overwritten
> -----------------------------------------------------------------------------
>
> INFO: 0xf53ab018-0xf53ab02b. First byte 0x71 instead of 0x6b
> INFO: Allocated in copy_process+0x70/0x1090 age=110 cpu=1 pid=28664
> INFO: Freed in free_task+0x2c/0x30 age=68 cpu=0 pid=28667
> INFO: Slab 0xc1ba6cc0 objects=8 used=5 fp=0xf53aafd0 flags=0x400020c3
> INFO: Object 0xf53aafd0 @offset=12240 fp=0xf53acfb0
>
> Bytes b4 0xf53aafc0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Object 0xf53aafd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf53aafe0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf53aaff0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf53ab000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf53ab010: 6b 6b 6b 6b 6b 6b 6b 6b 71 19 6f be dd 07 00 00 kkkkkkkkq.o<BE><DD>...
> Object 0xf53ab020: 71 19 6f be 6b 6b 6b 6b 6a 6b 6b eb 6b 6b 6b 6b q.o<BE>kkkkjkk<EB>kkkk

So what to notice is that this is offset hex(0xf53ab018-0xf53aafc0) =
'0x58L' from the beginning of the object (would be nice to have SLUB
print that too, btw), which corresponds to (struct
task_struct).se.vruntime (the "se" is a struct sched_entity). I'm
putting Ingo and Peter on the Cc.

What I find odd is that only some of the bytes in there are wrong,
take the stray "eb" in the last line (above), for example. And these
variables around offset 0x58 from the struct task struct are all u64s.
Is it possible that the corruption comes from somewhere else?

(Does the number look like a valid vruntime, for example?)

For the record,

$ grep SCHED .config
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_USER_SCHED=y
# CONFIG_CGROUP_SCHED is not set
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_HRTICK=y
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/