Re: [PATCH v2 1/1] fs/splice: add missing callback for inaccessible pages

From: Christian Borntraeger
Date: Tue May 05 2020 - 10:04:13 EST




On 05.05.20 16:01, Christian Borntraeger wrote:
>
>
> On 05.05.20 15:55, Ulrich Weigand wrote:
>> On Tue, May 05, 2020 at 05:34:45AM -0700, Dave Hansen wrote:
>>> On 5/4/20 6:41 AM, Ulrich Weigand wrote:
>>>> You're right that there is no mechanism to prevent new references,
>>>> but that's really never been the goal either. We're simply trying
>>>> to ensure that no I/O is ever done on a page that is in the "secure"
>>>> (or inaccessible) state. To do so, we rely on the assumption that
>>>> all code that starts I/O on a page cache page will *first*:
>>>> - mark the page as pending I/O by either taking an extra page
>>>> count, or by setting the Writeback flag; then:
>>>> - call arch_make_page_accessible(); then:
>>>> - start I/O; and only after I/O has finished:
>>>> - remove the "pending I/O" marker (Writeback and/or extra ref)
>>>
>>> Let's ignore writeback for a moment because get_page() is the more
>>> general case. The locking sequence is:
>>>
>>> 1. get_page() (or equivalent) "locks out" a page from converting to
>>> inaccessbile,
>>> 2. followed by a make_page_accessible() guarantees that the page
>>> *stays* accessible until
>>> 3. I/O is safe in this region
>>> 4. put_page(), removes the "lock out", I/O now unsafe
>>
>> Yes, exactly.
>>
>>> They key is, though, the get_page() must happen before
>>> make_page_accessible() and *every* place that acquires a new reference
>>> needs a make_page_accessible().
>>
>> Well, sort of: every place that acquires a new reference *and then
>> performs I/O* needs a make_page_accessible(). There seem to be a
>> lot of plain get_page() calls that aren't related to I/O.
>>
>>> try_get_page() is obviously one of those "new reference sites" and it
>>> only has one call site outisde of the gup code: generic_pipe_buf_get(),
>>> which is effectively patched by the patch that started this thread. The
>>> fact that this one oddball site _and_ gup are patched now is a good sign.
>>>
>>> *But*, I still don't know how that could work nicely:
>>>
>>>> static inline __must_check bool try_get_page(struct page *page)
>>>> {
>>>> page = compound_head(page);
>>>> if (WARN_ON_ONCE(page_ref_count(page) <= 0))
>>>> return false;
>>>> page_ref_inc(page);
>>>> return true;
>>>> }
>>>
>>> If try_get_page() collides with a freeze_page_refs(), it'll hit the
>>> WARN_ON_ONCE(), which is surely there for a good reason. I'm not sure
>>> that warning is _actually_ valid since freeze_page_refs() isn't truly a
>>> 0 refcount. But, the fact that this hasn't been encountered means that
>>> the testing here is potentially lacking.
>>
>> This is indeed interesting. In particular if you compare try_get_page
>> with try_get_compound_head in gup.c, which does instead:
>>
>> if (WARN_ON_ONCE(page_ref_count(head) < 0))
>> return NULL;
>>
>> which seems more reasonable to me, given the presence of the
>> page_ref_freeze method. So I'm not sure why try_get_page has <= 0.
>
>
> Just looked at
> commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 mm: add 'try_get_page()' helper function
>
> which says:
> Also like 'get_page()', you can't use this function unless you already
> had a reference to the page. The intent is that you can use this
> exactly like get_page(), but in situations where you want to limit the
> maximum reference count.
>
> The code currently does an unconditional WARN_ON_ONCE() if we ever hit
> the reference count issues (either zero or negative), as a notification
> that the conditional non-increment actually happened.
>
> If try_get_page must not be called with an existing reference, that means
s/not//
> that when we call it the page reference is already higher and our freeze
> will never succeed. That would imply that we cannot trigger this. No?
>