Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

From: David Hildenbrand
Date: Thu Nov 15 2018 - 07:01:25 EST


On 15.11.18 12:52, Borislav Petkov wrote:
> On Thu, Nov 15, 2018 at 12:20:40PM +0100, David Hildenbrand wrote:
>> Sorry to say, but that is the current practice without which
>> makedumpfile would not be able to work at all. (exclude user pages,
>> exclude page cache, exclude buddy pages). Let's not reinvent the wheel
>> here. This is how dumping works forever.
>
> Sorry, but "we've always done this in the past" doesn't make it better.

Just saying that "I'm not the first to do it, don't hit me with a stick" :)

>
>> I don't see how there should be "set of pages which do not have
>> PG_offline".
>
> It doesn't have to be a set of pages. Think a (mmconfig perhaps) region
> which the kdump kernel should completely skip because poking in it in
> the kdump kernel, causes all kinds of havoc like machine checks. etc.
> We've had and still have one issue like that.

Indeed. And we still have without makedumpfile. I think you are aware of
this, but I'll explain it just for consistency: PG_hwpoison

At some point we detect a HW error and mask a page as PG_hwpoison.

makedumpfile knows how to treat that flag and can exclude it from the
dump (== not access it). No crash.

kdump itself has no clue about old "struct pages". Especially:
a) Where they are located in memory (e.g. SPARSE)
b) What their format is ("where are the flags")
c) What the meaning of flags is ("what does bit X mean")

In order to know such information, we would have to do parsing of quite
some information inside the kernel in kdump. Basically what makedumpfile
does just now. Is this feasible? I don't think so.

So we would need another approach to communicate such information as you
said. I can't think of any, but if anybody reading this has an idea,
please speak up. I am interested.


The *only* way right now we would have to handle such scenarios:

1. While dumping memory and we get a machine check, fake reading a zero
page instead of crashing.
2. While dumping memory and we get a fault, fake reading a zero page
instead of crashing.

>
> But let me clarify my note: I don't want to be discussing with you the
> design of makedumpfile and how it should or should not work - that ship
> has already sailed. Apparently there are valid reasons to do it this
> way.

Indeed, and the basic design is to export these flags. (let's say
"unfortunately", being able to handle such stuff in kdump directly would
be the dream).

> I was *simply* stating that it feels wrong to export mm flags like that.
>
> But as I said already, that is mm guys' call and looking at how we're
> already exporting a bunch of stuff in the vmcoreinfo - including other
> mm flags - I guess one more flag doesn't matter anymore.

Fair enough, noted. If you have an idea how to handle this in kdump,
please let me know.

--

Thanks,

David / dhildenb