Re: dm bufio: Reduce dm_bufio_lock contention

From: Mikulas Patocka
Date: Fri Jun 22 2018 - 08:52:14 EST




On Fri, 22 Jun 2018, Michal Hocko wrote:

> On Fri 22-06-18 11:01:51, Michal Hocko wrote:
> > On Thu 21-06-18 21:17:24, Mikulas Patocka wrote:
> [...]
> > > What about this patch? If __GFP_NORETRY and __GFP_FS is not set (i.e. the
> > > request comes from a block device driver or a filesystem), we should not
> > > sleep.
> >
> > Why? How are you going to audit all the callers that the behavior makes
> > sense and moreover how are you going to ensure that future usage will
> > still make sense. The more subtle side effects gfp flags have the harder
> > they are to maintain.
>
> So just as an excercise. Try to explain the above semantic to users. We
> currently have the following.
>
> * __GFP_NORETRY: The VM implementation will try only very lightweight
> * memory direct reclaim to get some memory under memory pressure (thus
> * it can sleep). It will avoid disruptive actions like OOM killer. The
> * caller must handle the failure which is quite likely to happen under
> * heavy memory pressure. The flag is suitable when failure can easily be
> * handled at small cost, such as reduced throughput
>
> * __GFP_FS can call down to the low-level FS. Clearing the flag avoids the
> * allocator recursing into the filesystem which might already be holding
> * locks.
>
> So how are you going to explain gfp & (__GFP_NORETRY | ~__GFP_FS)? What
> is the actual semantic without explaining the whole reclaim or force
> users to look into the code to understand that? What about GFP_NOIO |
> __GFP_NORETRY? What does it mean to that "should not sleep". Do all
> shrinkers have to follow that as well?

My reasoning was that there is broken code that uses __GFP_NORETRY and
assumes that it can't fail - so conditioning the change on !__GFP_FS would
minimize the diruption to the broken code.

Anyway - if you want to test only on __GFP_NORETRY (and fix those 16
broken cases that assume that __GFP_NORETRY can't fail), I'm OK with that.

Mikulas


Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -2674,6 +2674,7 @@ static bool shrink_node(pg_data_t *pgdat
* the LRU too quickly.
*/
if (!sc->hibernation_mode && !current_is_kswapd() &&
+ !(sc->gfp_mask & __GFP_NORETRY) &&
current_may_throttle() && pgdat_memcg_congested(pgdat, root))
wait_iff_congested(BLK_RW_ASYNC, HZ/10);