Re: runtime regression with "x86/mm/pat: Emulate PAT when it is disabled"

From: Paul Gortmaker
Date: Mon Mar 07 2016 - 22:28:47 EST


[Re: runtime regression with "x86/mm/pat: Emulate PAT when it is disabled"] On 07/03/2016 (Mon 18:35) Toshi Kani wrote:

> On Mon, 2016-03-07 at 17:56 -0700, Toshi Kani wrote:
> > On Mon, 2016-03-07 at 18:53 -0500, Paul Gortmaker wrote:
> > > [Re: runtime regression with "x86/mm/pat: Emulate PAT when it is
> > > disabled"] On 07/03/2016 (Mon 16:38) Toshi Kani wrote:
> > >
> > > > On Mon, 2016-03-07 at 16:08 -0500, Paul Gortmaker wrote:
> > > > > [dropping oe list and lkml since attaching dmesg files.]
> > > > >
> > >
> > > [...]
> > >
> > > > > > Yes, please send me full dmesg files.  Since I do not know your
> > > > > > original state, the diff does not give me the whole picture.
> > > > >
> > > > > Attached.
> > > >
> > > > Thanks for the dmesg files!  As I suspected, there is no message from
> > > > pat_init() in both cases.  That is, you are missing the following
> > > > message,
> > > > which shows how PAT is configured to support cache attributes.
> > > >
> > > > # dmesg | grep PAT
> > > > [0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC-
> > > > WT  
> > >
> > > Interesting...
> > >
> > > >
> > > > It may have seemed working before, but you did not have WC configured
> > > > to PAT without calling pat_init().  There was not a proper check in
> > > > place to detect this error before.  Can you please check your code to
> > > > see what caused this skip of pat_init()?  If you have a git tree, I
> > > > can
> > > > take a look as well. 
> > >
> > > You already have git copies of what I'm running, since it is vanilla
> > > mainline commits.  No code changes at this end whatsoever.  I did the
> > > bisect on vanilla mainline.  All I took from yocto was their ".config"
> > >
> > > To recap, v4.1-rc5-21-g9dac62909451 works,  v4.1-rc5-22-g9cd25aac1f44
> > > fails, and v4.5-rc6 also fails.  If pat_init() isn't called then this
> > > is a bug in current mainline.  I'll have a look later myself and see
> > > if I can trace out how we expect to get to pat_init() and how that
> > > might be skipped inadvertently unless someone beats me to it.
> >
> > Oh, I see.  Can you send me the ".config" file?
>
> And also an output of /proc/cpuinfo, please?

Host? Guest? Both?

>
> I think I know what's going on.  I noticed that you have the following
> message in your dmesg files.
>
>  [    0.000000] MTRR: Disabled
>
> MTRR is set to disabled when your CPU is Intel but does not support MTRR.

I've run the test on a modern expensive xeon, a 4-5 year old xeon, and
on an old pentium dual core (the cheaper dumbed down core2-duo that doesn't
support virtualization) from around 2007. In all cases the result was
the same. Perhaps that is because the qemu launch script appears to set
the CPU type regardless? (it uses "-cpu qemu32" but I confess that I do
not know exactly what silicon that tries to emulate).

>  Perhaps, QEMU does not emulate MTRR?

I will be the 1st to admit that I am not a seasoned qemu user, so I've
no idea if the above is true. I still prefer testing on real hardware,
even if that comes across as "old school". :)

>
> pat_init() is not called when MTRR is disabled.  I think this dependency is
> wrong, and it needs to be fixed.
>
> This issue has been there for a long time, and you have been running
> essentially as PAT disabled in the past.  The commit in question simply
> detected this issue.

OK, that sounds good -- in that it seems we are finally getting to the
bottom of what happened here. Any thoughts on why built-in vs. modular
somehow managed to mask the issue?

Paul.
--

>
> Thanks,
> -Toshi