Re: [PATCH RFC 0/4] mm: Remember young bit for migration entries

From: Peter Xu
Date: Tue Aug 02 2022 - 18:16:05 EST


On Tue, Aug 02, 2022 at 10:59:42PM +0200, David Hildenbrand wrote:
> On 02.08.22 22:35, Peter Xu wrote:
> > On Tue, Aug 02, 2022 at 10:23:49PM +0200, David Hildenbrand wrote:
> >>> I don't think we only care about x86_64? Should other archs have the same
> >>> issue as long as there's the hardware young bit?
> >>>
> >>> Even without it, it'll affect page reclaim logic too, and that's also not
> >>> x86 only.
> >>
> >> Okay, reading the cover letter and looking at the code my understanding
> >> was that x86-64 is the real focus.
> >>
> >>>>
> >>>>>
> >>>>> Besides I actually have a question on the anon exclusive bit in the swap
> >>>>> pte: since we have that anyway, why we need a specific migration type for
> >>>>> anon exclusive pages? Can it be simply read migration entries with anon
> >>>>> exclusive bit set?
> >>>>
> >>>> Not before all arch support pte_swp_mkexclusive/pte_swp_exclusive/.
> >>>>
> >>>> As pte_swp_mkexclusive/pte_swp_exclusive/ only applies to actual swap
> >>>> PTEs, you could even reuse that bit for migration entries and get at
> >>>> alteast the most relevant 64bit architectures supported easily.
> >>>
> >>> Yes, but I think having two mechanisms for the single problem can confuse
> >>> people.
> >>>
> >>
> >> It would be one bit with two different meanings depending on the swp type.
> >>
> >>> IIUC the swap bit is already defined in major archs anyway, and since anon
> >>> exclusive bit is best-effort (or am I wrong?..), I won't worry too much on
> >>
> >> It kind-of is best effort, but the goal is to have all archs support it.
> >>
> >> ... just like the young bit here?
> >
> > Exactly, so I'm also wondering whether we can move the swp pte anon
> > exclusive bit into swp entry. It just sounds weird to have them defined in
> > two ways.
>
> I'd argue it's just the swp vs. nonswp difference that are in fact two
> different concepts (device+offset vs. type+pte). And some dirty details
> how swp entries are actually used.
>
> With swp entries you have to be very careful, for example, take a look
> at radix_to_swp_entry() and swp_to_radix_entry(). That made me refrain
> from touching anything inside actual swp entries and instead store it in
> the pte.

I don't really see any risk - it neither touches the swp entry nor do
complicated things around it (shift 1 and set bit 0 to 1). Please feel
free to share your concern in case I overlooked something, though.

>
> >
> >>
> >>> archs outside x86/arm/ppc/s390 on having anon exclusive bit lost during
> >>> migrations, because afaict the whole swap type of ANON_EXCLUSIVE_READ is
> >>> only servicing that very minority.. which seems to be a pity to waste the
> >>
> >> I have a big item on my todo list to support all, but I have different
> >> priorities right now.
> >>
> >> If there is no free bit, simply steal one from the offset ... which is
> >> the same thing your approach would do, just in a different way, no?
> >>
> >>> swp type on all archs even if the archs defined swp pte bits just for anon
> >>> exclusive.
> >>
> >> Why do we care? We walk about one type not one bit.
> >
> > The swap type address space is still limited, I'd say we should save when
> > possible. I believe people caring about swapping care about the limit of
> > swap devices too. If the offset can keep it, I think it's better than the
>
> Ehm, last time I did the math I came to the conclusion that nobody
> cares. Let me redo the math:
>
> MAX_SWAPFILES = 1<<5 - 1 - 1 - 4 - 3 - 1 = 22
>
> Which is the worst case right now with all kinds of oddity compiled in
> (sorry CONFIG_DEVICE_PRIVATE).
>
> So far nobody complaint.

Yeah. To me using one bit of it is fine especially if that's the best to
do. Here what confuses me is we have two ways to represent "whether this
page is anon exclusive" in a swap pte, either we need to go into the type
or using the bit. The trick here is whether the swap pte bit makes sense
depends on the swp type first too, while the swap type can be "anon
exclusive read migration" itself.

>
> > swap type. De-dup either the type or the swap pte bit would be nicer, imho.
> >
>
> If you manage bits in the pte manually, you might be able to get a
> better packing density, if bits are scattered around. Just take a look
> at the x86_64 location of _PAGE_SWP_EXCLUSIVE.
>
> What I'm rooting for is something like
>
> #define pte_nonswp_mkyoung pte_swp_mkexclusive
>
> Eventually with some VM_BUG_ONs to make sure people call it on the right
> swp ptes.
>
> If we ever want to get rid of SWP_MIGRATION_READ_EXCLUSIVE (so people
> can have 23 swap devices), and eventually have separate bits for both.
> For now it's not necessary.

Okay, but that's probably the last thing I'd like to try - I don't want to
further complicate the anon exclusive bit in swap pte.. I'd think cleaning
that up somehow would be nice, but as you mentioned if no one complains I
have no strong opinion either.

Thanks,

--
Peter Xu