Re: [PATCH v3 0/3] ceph: fscrypt: fix atomic open bug for encrypted directories

From: Eric Biggers
Date: Mon Mar 20 2023 - 18:16:20 EST


On Mon, Mar 20, 2023 at 08:47:18PM +0800, Xiubo Li wrote:
>
> On 20/03/2023 19:20, Ilya Dryomov wrote:
> > On Mon, Mar 20, 2023 at 2:07 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > >
> > > On 17/03/2023 02:14, Luís Henriques wrote:
> > > > Hi!
> > > >
> > > > I started seeing fstest generic/123 failing in ceph fscrypt, when running it
> > > > with 'test_dummy_encryption'. This test is quite simple:
> > > >
> > > > 1. Creates a directory with write permissions for root only
> > > > 2. Writes into a file in that directory
> > > > 3. Uses 'su' to try to modify that file as a different user, and
> > > > gets -EPERM
> > > >
> > > > All the test steps succeed, but the test fails to cleanup: 'rm -rf <dir>'
> > > > will fail with -ENOTEMPTY. 'strace' shows that calling unlinkat() to remove
> > > > the file got a -ENOENT and then -ENOTEMPTY for the directory.
> > > >
> > > > This is because 'su' does a drop_caches ('su (874): drop_caches: 2' in
> > > > dmesg), and ceph's atomic open will do:
> > > >
> > > > if (IS_ENCRYPTED(dir)) {
> > > > set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
> > > > if (!fscrypt_has_encryption_key(dir)) {
> > > > spin_lock(&dentry->d_lock);
> > > > dentry->d_flags |= DCACHE_NOKEY_NAME;
> > > > spin_unlock(&dentry->d_lock);
> > > > }
> > > > }
> > > >
> > > > Although 'dir' has the encryption key available, fscrypt_has_encryption_key()
> > > > will return 'false' because fscrypt info isn't yet set after the cache
> > > > cleanup.
> > > >
> > > > The first patch will add a new helper for the atomic_open that will force
> > > > the fscrypt info to be loaded into an inode that has been evicted recently
> > > > but for which the key is still available.
> > > >
> > > > The second patch switches ceph atomic_open to use the new fscrypt helper.
> > > >
> > > > Cheers,
> > > > --
> > > > Luís
> > > >
> > > > Changes since v2:
> > > > - Make helper more generic and to be used both in lookup and atomic open
> > > > operations
> > > > - Modify ceph_lookup (patch 0002) and ceph_atomic_open (patch 0003) to use
> > > > the new helper
> > > >
> > > > Changes since v1:
> > > > - Dropped IS_ENCRYPTED() from helper function because kerneldoc says
> > > > already that it applies to encrypted directories and, most importantly,
> > > > because it would introduce a different behaviour for
> > > > CONFIG_FS_ENCRYPTION and !CONFIG_FS_ENCRYPTION.
> > > > - Rephrased helper kerneldoc
> > > >
> > > > Changes since initial RFC (after Eric's review):
> > > > - Added kerneldoc comments to the new fscrypt helper
> > > > - Dropped '__' from helper name (now fscrypt_prepare_atomic_open())
> > > > - Added IS_ENCRYPTED() check in helper
> > > > - DCACHE_NOKEY_NAME is not set if fscrypt_get_encryption_info() returns an
> > > > error
> > > > - Fixed helper for !CONFIG_FS_ENCRYPTION (now defined 'static inline')
> > > This series looks good to me.
> > >
> > > And I have run the test locally and worked well.
> > >
> > >
> > > > Luís Henriques (3):
> > > > fscrypt: new helper function - fscrypt_prepare_lookup_partial()
> > > Eric,
> > >
> > > If possible I we can pick this together to ceph repo and need your ack
> > > about this. Or you can pick it to the crypto repo then please feel free
> > > to add:
> > >
> > > Tested-by: Xiubo Li <xiubli@xxxxxxxxxx> and Reviewed-by: Xiubo Li
> > > <xiubli@xxxxxxxxxx>
> > I would prefer the fscrypt helper to go through the fscrypt tree.
>
> Sure. This also LGTM.
>
> Thanks
>

I've applied it to
https://git.kernel.org/pub/scm/fs/fscrypt/linux.git/log/?h=for-next

But I ended up reworking the comment a bit and moving the function to be just
below __fscrypt_prepare_lookup(). So I sent out v4 that matches what I applied.

BTW, I'm wondering if anyone has had any thoughts about the race condition I
described at https://lore.kernel.org/r/ZBC1P4Gn6eAKD61+@sol.localdomain/. In
particular, I'm wondering whether this helper function will need to be changed
or not. Maybe not, because ceph could look at DCACHE_NOKEY_NAME to determine
whether the name should be treated as a no-key name or not, instead of checking
fscrypt_has_encryption_key() again (as I think it is doing currently)?

- Eric