Re: [PATCH 4.4 48/76] libceph: force GFP_NOIO for socket allocations

From: Michal Hocko
Date: Wed Mar 29 2017 - 07:17:27 EST


On Wed 29-03-17 13:10:01, Ilya Dryomov wrote:
> On Wed, Mar 29, 2017 at 12:55 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > On Wed 29-03-17 12:41:26, Michal Hocko wrote:
> > [...]
> >> > ceph_con_workfn
> >> > mutex_lock(&con->mutex) # ceph_connection::mutex
> >> > try_write
> >> > ceph_tcp_connect
> >> > sock_create_kern
> >> > GFP_KERNEL allocation
> >> > allocator recurses into XFS, more I/O is issued
> >
> > One more note. So what happens if this is a GFP_NOIO request which
> > cannot make any progress? Your IO thread is blocked on con->mutex
> > as you write below but the above thread cannot proceed as well. So I am
> > _really_ not sure this acutally helps.
>
> This is not the only I/O worker. A ceph cluster typically consists of
> at least a few OSDs and can be as large as thousands of OSDs. This is
> the reason we are calling sock_create_kern() on the writeback path in
> the first place: pre-opening thousands of sockets isn't feasible.

Sorry for being dense here but what actually guarantees the forward
progress? My current understanding is that the deadlock is caused by
con->mutext being held while the allocation cannot make a forward
progress. I can imagine this would be possible if the other io flushers
depend on this lock. But then NOIO vs. KERNEL allocation doesn't make
much difference. What am I missing?
--
Michal Hocko
SUSE Labs