Re: [PATCH] v9fs: handle async processing of F_SETLK with FL_SLEEP flag

From: Vasily Averin
Date: Fri Dec 24 2021 - 03:24:18 EST


On 24.12.2021 10:31, Dominique Martinet wrote:
> Vasily Averin wrote on Fri, Dec 24, 2021 at 10:08:57AM +0300:
>>> I'm not up to date with lock mechanisms, could you confirm I understand
>>> the flags right?
>>> - F_SETLK: tries to lock, on conflict return immediately with error
>>> - F_SETLKW|FL_SLEEP: tries to lock, on conflict wait for lock to become available
>>> - F_SETLKW: not possible through flock/fcntl setlk, can happen otherwise?
>>> but for 9p purpose same as above.
>>> - F_SETLK|FL_SLEEP: tries to lock, on conflict ????? you'd want it to
>>> return immediately but setup some callback to be woken up? how could
>>> that work without passing some wake up struct? or just behave as plain
>>> F_SETLK? but then FL_SLEEP has no purpose, I don't get it.
>>
>> I apologize in advance for the long answer, but I tried to state all the details
>> of the detected problem.
>>
>> Below is description copy-pasted from comment above vfs_lock_file()
>
> Thanks, I hadn't noticed this comment this morning.
>
>> "
>> * To avoid blocking kernel daemons, such as lockd, that need to acquire POSIX
>> * locks, the ->lock() interface may return asynchronously, before the lock has
>> * been granted or denied by the underlying filesystem, if (and only if)
>> * lm_grant is set. Callers expecting ->lock() to return asynchronously
>> * will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if)
>> * the request is for a blocking lock. When ->lock() does return asynchronously,
>> * it must return FILE_LOCK_DEFERRED, and call ->lm_grant() when the lock
>> * request completes.
>
> Ok so that's the part I was missing.
> The file_lock struct will have fl_lmops->lm_grant set in this case and
> we just need to remember that and call lm_grant when the lock has been set...

Unfortunately it isn't enough.
lm_grant is defined by lockd only, but nfsd and ksmbd does not use it.

Most of file systems does not use lm_grant, and it's OK.
2 file systems -- gfs2 and ocfs -- use lm_grant via dlm_posix_lock(),
which works correctly even if lm_grant is not defined.

>> They all are servers, and they can receive blocking lock requests from own clients.
>> They all cannot process such requests synchronously because it causes server deadlock.
>> In simplest form, if single threaded nfsd is blocked on processing such request,
>> whole nfs server cannot process any other commands.
>
> Note 9p does not have an fh_to_dentry op (no open by handle type of
> calls, the protocol just has no way of dealing with it), so I believe
> knfsd cannot re-export 9p filesystems and wouldn't be surprised if
> others can't either -- as thus this all might not be an issue for you if
> F_SETLK|FL_SLEEP users all are such servers

Perhaps you are right.
I'm not qualified enough to confirm it, but I tend to agree with you: without defined export_op
nfsd cannot export your file system.
However:
1) perhaps this issue can be triggered via ksmbd
2) you can add export_op in the future and forget about this problem.
3) proposed changes are minimal and does not affect any other use cases.

>> One of our customers tried to export fuse/glusters via nfsd and reported about
>> memory corruption in nfsd4_lock() function. We found few related bugs in nfsd,
>> however finally we noticed that fuse blocks on processing such requests.
>> During investigation we have found that fuse just ignored F_SETLK command,
>> handled FL_SLEEP flag and submitted blocking FUSE_SETLKW command.
>
> I'm not sure I understand how upgrading to SETLKW is worse than dropping
> the FL_SLEEP flag (I mean, I see it's bad as it wasn't what the server
> expects, but while it will block a thread for an undefined period of
> time and may cause deadlocks I don't see how it would cause memory
> corruptions?)

kernel threads cannot use blocking SETLKW.
SETLKW can be safely used by usual processes, if they want to wait until lock
will be captured. However it is not an option for server with limited number
of working threads.

Memory corruption was internal nfsd issues triggered by very long processing
of blocking lock request in fuse.

I'm sorry, I'll answer rest of questions later.

Thank you, Vasily Averin