Re: [PATCH net-next v2 09/15] mm: page_frag: reuse MSB of 'size' field for pfmemalloc

From: Yunsheng Lin
Date: Mon May 06 2024 - 08:34:14 EST


On 2024/4/30 22:54, Alexander Duyck wrote:
> On Tue, Apr 30, 2024 at 5:06 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>>
>> On 2024/4/29 22:49, Alexander Duyck wrote:
>>
>> ...
>>
>>>>>
>>>>
>>>> After considering a few different layouts for 'struct page_frag_cache',
>>>> it seems the below is more optimized:
>>>>
>>>> struct page_frag_cache {
>>>> /* page address & pfmemalloc & order */
>>>> void *va;
>>>
>>> I see. So basically just pack the much smaller bitfields in here.
>>>
>>>>
>>>> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <= 32)
>>>> u16 pagecnt_bias;
>>>> u16 size;
>>>> #else
>>>> u32 pagecnt_bias;
>>>> u32 size;
>>>> #endif
>>>> }
>>>>
>>>> The lower bits of 'va' is or'ed with the page order & pfmemalloc instead
>>>> of offset or pagecnt_bias, so that we don't have to add more checking
>>>> for handling the problem of not having enough space for offset or
>>>> pagecnt_bias for PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE and 32 bits system.
>>>> And page address & pfmemalloc & order is unchanged for the same page
>>>> in the same 'page_frag_cache' instance, it makes sense to fit them
>>>> together.
>>>>
>>>> Also, it seems it is better to replace 'offset' with 'size', which indicates
>>>> the remaining size for the cache in a 'page_frag_cache' instance, and we
>>>> might be able to do a single 'size >= fragsz' checking for the case of cache
>>>> being enough, which should be the fast path if we ensure size is zoro when
>>>> 'va' == NULL.
>>>
>>> I'm not sure the rename to size is called for as it is going to be
>>> confusing. Maybe something like "remaining"?
>>
>> Yes, using 'size' for that is a bit confusing.
>> Beside the above 'remaining', by googling, it seems we may have below
>> options too:
>> 'residual','unused', 'surplus'
>>
>>>
>>>> Something like below:
>>>>
>>>> #define PAGE_FRAG_CACHE_ORDER_MASK GENMASK(1, 0)
>>>> #define PAGE_FRAG_CACHE_PFMEMALLOC_BIT BIT(2)
>>>
>>> The only downside is that it is ossifying things so that we can only
>>
>> There is 12 bits that is always useful, we can always extend ORDER_MASK
>> to more bits if lager order number is needed.
>>
>>> ever do order 3 as the maximum cache size. It might be better to do a
>>> full 8 bytes as on x86 it would just mean accessing the low end of a
>>> 16b register. Then you can have pfmemalloc at bit 8.
>>
>> I am not sure I understand the above as it seems we may have below option:
>> 1. Use somthing like the above ORDER_MASK and PFMEMALLOC_BIT.
>> 2. Use bitfield as something like below:
>>
>> unsigned long va:20;---or 52 for 64bit system
>> unsigned long pfmemalloc:1
>> unsigned long order:11;
>>
>> Or is there a better idea in your mind?
>
> All I was suggesting was to make the ORDER_MASK 8 bits. Doing that the
> compiler can just store VA in a register such as RCX and instead of
> having to bother with a mask it could then just use CL to access the
> order. As far as the flag bits such as pfmemalloc the 4 bits starting
> at 8 would make the most sense since it doesn't naturally align to
> anything and would have to be masked anyway.

Ok.

>
> Using a bitfield like you suggest would be problematic as the compiler
> would assume a shift is needed so you would have to add a shift to
> your code to offset it for accessing VA.
>
>>>
>>>> struct page_frag_cache {
>>>> /* page address & pfmemalloc & order */
>>>> void *va;
>>>>
>>>
>>> When you start combining things like this we normally would convert va
>>> to an unsigned long.
>>
>> Ack.

It seems it makes more sense to convert va to something like 'struct encoded_va'
mirroring 'struct encoded_page' in below:

https://elixir.bootlin.com/linux/v6.7-rc8/source/include/linux/mm_types.h#L222

>>
>>>
>>>> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <= 32)
>>>> u16 pagecnt_bias;
>>>> u16 size;
>>>> #else
>>>> u32 pagecnt_bias;
>>>> u32 size;
>>>> #endif
>>>> };
>>>>
>>>>