Re: [PATCH v18 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs

From: Michał Mirosław
Date: Wed Jun 21 2023 - 09:25:11 EST


On Wed, 21 Jun 2023 at 06:44, Muhammad Usama Anjum
<usama.anjum@xxxxxxxxxxxxx> wrote:
> On 6/21/23 3:05 AM, Michał Mirosław wrote:
> > On Tue, 20 Jun 2023 at 13:16, Muhammad Usama Anjum
> > <usama.anjum@xxxxxxxxxxxxx> wrote:
> >> On 6/19/23 1:16 PM, Michał Mirosław wrote:
> >>> On Fri, 16 Jun 2023 at 08:57, Muhammad Usama Anjum
> >>> <usama.anjum@xxxxxxxxxxxxx> wrote:
> >>>>
> >>>> On 6/16/23 1:07 AM, Michał Mirosław wrote:
> >>>>> On Thu, 15 Jun 2023 at 17:11, Muhammad Usama Anjum
> >>>>> <usama.anjum@xxxxxxxxxxxxx> wrote:
> >>>>>> On 6/15/23 7:52 PM, Michał Mirosław wrote:
> >>>>>>> On Thu, 15 Jun 2023 at 15:58, Muhammad Usama Anjum
> >>>>>>> <usama.anjum@xxxxxxxxxxxxx> wrote:
> >>>>>>>> I'll send next revision now.
> >>>>>>>> On 6/14/23 11:00 PM, Michał Mirosław wrote:
> >>>>>>>>> (A quick reply to answer open questions in case they help the next version.)
> >>> [...]
> >>>>>>>>> I guess this will be reworked anyway, but I'd prefer this didn't need
> >>>>>>>>> custom errors etc. If we agree to decoupling the selection and GET
> >>>>>>>>> output, it could be:
> >>>>>>>>>
> >>>>>>>>> bool is_interesting_page(p, flags); // this one does the
> >>>>>>>>> required/anyof/excluded match
> >>>>>>>>> size_t output_range(p, start, len, flags); // this one fills the
> >>>>>>>>> output vector and returns how many pages were fit
> >>>>>>>>>
> >>>>>>>>> In this setup, `is_interesting_page() && (n_out = output_range()) <
> >>>>>>>>> n_pages` means this is the final range, no more will fit. And if
> >>>>>>>>> `n_out == 0` then no pages fit and no WP is needed (no other special
> >>>>>>>>> cases).
> >>>>>>>> Right now, pagemap_scan_output() performs the work of both of these two
> >>>>>>>> functions. The part can be broken into is_interesting_pages() and we can
> >>>>>>>> leave the remaining part as it is.
> >>>>>>>>
> >>>>>>>> Saying that n_out < n_pages tells us the buffer is full covers one case.
> >>>>>>>> But there is case of maximum pages have been found and walk needs to be
> >>>>>>>> aborted.
> >>>>>>>
> >>>>>>> This case is exactly what `n_out < n_pages` will cover (if scan_output
> >>>>>>> uses max_pages properly to limit n_out).
> >>>>>>> Isn't it that when the buffer is full we want to abort the scan always
> >>>>>>> (with WP if `n_out > 0`)?
> >>>>>> Wouldn't it be duplication of condition if buffer is full inside
> >>>>>> pagemap_scan_output() and just outside it. Inside pagemap_scan_output() we
> >>>>>> check if we have space before putting data inside it. I'm using this same
> >>>>>> condition to indicate that buffer is full.
> >>>>>
> >>>>> I'm not sure what do you mean? The buffer-full conditions would be
> >>>>> checked in ..scan_output() and communicated to the caller by returning
> >>>>> N less than `n_pages` passed in. This is exactly how e.g. read()
> >>>>> works: if you get less than requested you've hit the end of the file.
> >>>>> If the file happens to have size that is equal to the provided buffer
> >>>>> length, the next read() will return 0.
> >>>> Right now we have:
> >>>>
> >>>> pagemap_scan_output():
> >>>> if (p->vec_buf_index >= p->vec_buf_len)
> >>>> return PM_SCAN_BUFFER_FULL;
> >>>> if (p->found_pages == p->max_pages)
> >>>> return PM_SCAN_FOUND_MAX_PAGES;
> >>>
> >>> Why do you need to differentiate between those cases?
> >>>
> >>>> pagemap_scan_pmd_entry():
> >>>> ret = pagemap_scan_output(bitmap, p, start, n_pages);
> >>>> if (ret >= 0) // success
> >>>> make_UFFD_WP and flush
> >>>> else
> >>>> buffer_error
> >>>>
> >>>> You are asking me to do:
> >>>>
> >>>> pagemap_scan_output():
> >>>> if (p->vec_buf_index >= p->vec_buf_len)
> >>>> return 0;
> >>>
> >>>> if (p->found_pages == p->max_pages)
> >>>> return PM_SCAN_FOUND_MAX_PAGES;
> >>>
> >>> This should be instead:
> >>>
> >>> n_pages = min(p->max_pags - p_found_pages, n_pages)
> >>> ...
> >>> return n_pages;
> >> You are missing the optimization here that we check for full buffer every
> >> time adding to user buffer. This was added to remove extra iteration of
> >> page walk if buffer is full already. The way you are suggesting will remove it.
> >>
> >> So you are returning remaining pages to be found now. This doesn't seem
> >> right. If max_pages is 520, found_pages is 0 and n_pages is 512 before
> >> calling pagemap_scan_output(). found_pages would become 512 after adding
> >> 512 pages to output buffer. But n_pages would return 8 instead of 512. You
> >> were saying we should return the number of pages added to the output buffer.
> >
> > Ok, if we want this optimization, then i'd rework it so that we have:
> >
> > bool pagemap_scan_output(..., int *n_pages)
> > {
> > limit n_pages;
> > ...
> > return have_more_room_in_output;
> > }
> This is becoming more and more closer to what I have in the code. The only
> difference now is that you are asking me to not return the buffer full
> status from inside this function and instead there should be a input+output
> pointer to n_pages and the caller would return the buffer full status. As
> compared to the suggestion, the current form looks simpler. My earlier
> point (
> https://lore.kernel.org/all/2e1b80f1-0385-0674-ae5f-9703a6ef975d@xxxxxxxxxxxxx
> ) is valid again. I don't want to bring logic out of pagemap_scan_output().
> This is internal function. There could be thousand ways how internal code
> can be written. I've really liked so many optimizations which you have
> advised. This isn't something worth doing. It would increase lines of code
> with no added readability benefit.

Yes, I try to suggest a minimal change. The benefit is that you don't
need special error values anymore and so the cognitive load to
understand the code flow is less. The idea is not to strictly save on
lines typed, but on localising the information needed as much as
possible. Also the distinction between BUFFER_FULL and FOUND_MAX_PAGES
is only in which criteria was detected, but otherwise the code should
behave the same way.

Best Regards
Michał Mirosław