Re: crisv32 runtime failure in -next due to 'page-flags: define behavior SL*B-related flags on compound pages'

From: Kirill A. Shutemov
Date: Tue Sep 22 2015 - 08:04:11 EST


On Mon, Sep 21, 2015 at 06:17:34PM -0700, Guenter Roeck wrote:
> On 09/21/2015 08:34 AM, Kirill A. Shutemov wrote:
> >Guenter Roeck wrote:
> >>On 09/18/2015 07:53 AM, Jesper Nilsson wrote:
> >>>On Fri, Sep 18, 2015 at 05:25:07PM +0300, Kirill A. Shutemov wrote:
> >>>>On Thu, Sep 17, 2015 at 09:29:27AM -0700, Guenter Roeck wrote:
> >>>>>Hi,
> >>>>>
> >>>>>my crisv32 qemu test fails with next-20150917 as follows.
> >>>>>
> >>>>>NET: Registered protocol family 16
> >>>>>kernel BUG at mm/slab.c:1648!
> >>>>>Linux 4.3.0-rc1-next-20150917 #1 Wed Sep 16 23:56:59 PDT 2015
> >>>>>Oops: 0000
> >>>>>
> >>>>>[ register dump follows ]
> >>>>>
> >>>>>See http://server.roeck-us.net:8010/builders/qemu-crisv32-next/builds/83/steps/qemubuildcommand/logs/stdio
> >>>>>for a complete log.
> >>>>
> >>>>Is there a chance to get proper backtrace?
> >>>
> >>>Yes, it should be possible with CONFIG_KALLSYMS=y in the kconfig.
> >>>
> >>
> >>Good to know. I added it to my configuration.
> >>
> >>Here it is:
> >>
> >>kernel BUG at mm/slab.c:1648!
> >
> >I still don't understand what's going on :(
> >Could you try with this instrumentation:
> >
> >diff --git a/mm/slab.c b/mm/slab.c
> >index ce9c6531e6f7..10035d1a06d3 100644
> >--- a/mm/slab.c
> >+++ b/mm/slab.c
> >@@ -1645,7 +1645,11 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
> > sub_zone_page_state(page_zone(page),
> > NR_SLAB_UNRECLAIMABLE, nr_freed);
> >
> >- BUG_ON(!PageSlab(page));
> >+ if (!PageSlab(page)) {
> >+ dump_page(page, "page");
> >+ dump_page(compound_head(page), "compound_head(page)");
> >+ BUG();
> >+ }
> > __ClearPageSlabPfmemalloc(page);
> > __ClearPageSlab(page);
> > page_mapcount_reset(page);
> >
>
> page:c04a5340 count:1 mapcount:1 mapping:c1f34080 index:0xc1f34060
> flags: 0x80(slab)
> page dumped because: page
> page:c1f17a04 count:0 mapcount:1 mapping:00d13600 index:0xc0
> flags: 0x0()
> page dumped because: compound_head(page)
>
> Does that help ?

Kinda. It's false positive PageTail() due low bit set in
page->rcu_head.next.

It happens (at least) due broken alignment of 'rcu' field within
task_struct -- offsetof(struct task_struct, rcu): 773.

That's looks veery broken. I would guess compiler does something horribly
wrong. I hope it's not an ABI issue. :-/

Mikael? Jesper?

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/