Re: mm: BUG in unmap_page_range
From: Mel Gorman
Date:  Tue Sep 09 2014 - 17:33:29 EST
On Mon, Sep 08, 2014 at 01:56:55PM -0400, Sasha Levin wrote:
> On 09/08/2014 01:18 PM, Mel Gorman wrote:
> > A worse possibility is that somehow the lock is getting corrupted but
> > that's also a tough sell considering that the locks should be allocated
> > from a dedicated cache. I guess I could try breaking that to allocate
> > one page per lock so DEBUG_PAGEALLOC triggers but I'm not very
> > optimistic.
> 
> I did see ptl corruption couple days ago:
> 
> 	https://lkml.org/lkml/2014/9/4/599
> 
> Could this be related?
> 
Possibly although the likely explanation then would be that there is
just general corruption coming from somewhere. Even using your config
and applying a patch to make linux-next boot (already in Tejun's tree)
I was unable to reproduce the problem after running for several hours. I
had to run trinity on tmpfs as ext4 and xfs blew up almost immediately
so I have a few questions.
1. What filesystem are you using?
2. What compiler in case it's an experimental compiler? I ask because I
   think I saw a patch from you adding support so that the kernel would
   build with gcc 5
3. Does your hardware support TSX or anything similarly funky that would
   potentially affect locking?
4. How many sockets are on your test machine in case reproducing it
   depends in a machine large enough to open a timing race?
As I'm drawing a blank on what would trigger the bug I'm hoping I can
reproduce this locally and experiement a bit.
Thanks.
-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/