Re: [PATCH v2 1/1] fs/splice: add missing callback for inaccessible pages

From: Christian Borntraeger
Date: Tue May 05 2020 - 10:01:46 EST




On 05.05.20 15:55, Ulrich Weigand wrote:
> On Tue, May 05, 2020 at 05:34:45AM -0700, Dave Hansen wrote:
>> On 5/4/20 6:41 AM, Ulrich Weigand wrote:
>>> You're right that there is no mechanism to prevent new references,
>>> but that's really never been the goal either. We're simply trying
>>> to ensure that no I/O is ever done on a page that is in the "secure"
>>> (or inaccessible) state. To do so, we rely on the assumption that
>>> all code that starts I/O on a page cache page will *first*:
>>> - mark the page as pending I/O by either taking an extra page
>>> count, or by setting the Writeback flag; then:
>>> - call arch_make_page_accessible(); then:
>>> - start I/O; and only after I/O has finished:
>>> - remove the "pending I/O" marker (Writeback and/or extra ref)
>>
>> Let's ignore writeback for a moment because get_page() is the more
>> general case. The locking sequence is:
>>
>> 1. get_page() (or equivalent) "locks out" a page from converting to
>> inaccessbile,
>> 2. followed by a make_page_accessible() guarantees that the page
>> *stays* accessible until
>> 3. I/O is safe in this region
>> 4. put_page(), removes the "lock out", I/O now unsafe
>
> Yes, exactly.
>
>> They key is, though, the get_page() must happen before
>> make_page_accessible() and *every* place that acquires a new reference
>> needs a make_page_accessible().
>
> Well, sort of: every place that acquires a new reference *and then
> performs I/O* needs a make_page_accessible(). There seem to be a
> lot of plain get_page() calls that aren't related to I/O.
>
>> try_get_page() is obviously one of those "new reference sites" and it
>> only has one call site outisde of the gup code: generic_pipe_buf_get(),
>> which is effectively patched by the patch that started this thread. The
>> fact that this one oddball site _and_ gup are patched now is a good sign.
>>
>> *But*, I still don't know how that could work nicely:
>>
>>> static inline __must_check bool try_get_page(struct page *page)
>>> {
>>> page = compound_head(page);
>>> if (WARN_ON_ONCE(page_ref_count(page) <= 0))
>>> return false;
>>> page_ref_inc(page);
>>> return true;
>>> }
>>
>> If try_get_page() collides with a freeze_page_refs(), it'll hit the
>> WARN_ON_ONCE(), which is surely there for a good reason. I'm not sure
>> that warning is _actually_ valid since freeze_page_refs() isn't truly a
>> 0 refcount. But, the fact that this hasn't been encountered means that
>> the testing here is potentially lacking.
>
> This is indeed interesting. In particular if you compare try_get_page
> with try_get_compound_head in gup.c, which does instead:
>
> if (WARN_ON_ONCE(page_ref_count(head) < 0))
> return NULL;
>
> which seems more reasonable to me, given the presence of the
> page_ref_freeze method. So I'm not sure why try_get_page has <= 0.


Just looked at
commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 mm: add 'try_get_page()' helper function

which says:
Also like 'get_page()', you can't use this function unless you already
had a reference to the page. The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.

The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.

If try_get_page must not be called with an existing reference, that means
that when we call it the page reference is already higher and our freeze
will never succeed. That would imply that we cannot trigger this. No?