Re: [PATCH v3 0/3] ceph: fscrypt: fix atomic open bug for encrypted directories

From: Luís Henriques
Date: Tue Mar 21 2023 - 08:14:04 EST


Eric Biggers <ebiggers@xxxxxxxxxx> writes:

> On Mon, Mar 20, 2023 at 08:47:18PM +0800, Xiubo Li wrote:
>>
>> On 20/03/2023 19:20, Ilya Dryomov wrote:
>> > On Mon, Mar 20, 2023 at 2:07 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
>> > >
>> > > On 17/03/2023 02:14, Luís Henriques wrote:
>> > > > Hi!
>> > > >
>> > > > I started seeing fstest generic/123 failing in ceph fscrypt, when running it
>> > > > with 'test_dummy_encryption'. This test is quite simple:
>> > > >
>> > > > 1. Creates a directory with write permissions for root only
>> > > > 2. Writes into a file in that directory
>> > > > 3. Uses 'su' to try to modify that file as a different user, and
>> > > > gets -EPERM
>> > > >
>> > > > All the test steps succeed, but the test fails to cleanup: 'rm -rf <dir>'
>> > > > will fail with -ENOTEMPTY. 'strace' shows that calling unlinkat() to remove
>> > > > the file got a -ENOENT and then -ENOTEMPTY for the directory.
>> > > >
>> > > > This is because 'su' does a drop_caches ('su (874): drop_caches: 2' in
>> > > > dmesg), and ceph's atomic open will do:
>> > > >
>> > > > if (IS_ENCRYPTED(dir)) {
>> > > > set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
>> > > > if (!fscrypt_has_encryption_key(dir)) {
>> > > > spin_lock(&dentry->d_lock);
>> > > > dentry->d_flags |= DCACHE_NOKEY_NAME;
>> > > > spin_unlock(&dentry->d_lock);
>> > > > }
>> > > > }
>> > > >
>> > > > Although 'dir' has the encryption key available, fscrypt_has_encryption_key()
>> > > > will return 'false' because fscrypt info isn't yet set after the cache
>> > > > cleanup.
>> > > >
>> > > > The first patch will add a new helper for the atomic_open that will force
>> > > > the fscrypt info to be loaded into an inode that has been evicted recently
>> > > > but for which the key is still available.
>> > > >
>> > > > The second patch switches ceph atomic_open to use the new fscrypt helper.
>> > > >
>> > > > Cheers,
>> > > > --
>> > > > Luís
>> > > >
>> > > > Changes since v2:
>> > > > - Make helper more generic and to be used both in lookup and atomic open
>> > > > operations
>> > > > - Modify ceph_lookup (patch 0002) and ceph_atomic_open (patch 0003) to use
>> > > > the new helper
>> > > >
>> > > > Changes since v1:
>> > > > - Dropped IS_ENCRYPTED() from helper function because kerneldoc says
>> > > > already that it applies to encrypted directories and, most importantly,
>> > > > because it would introduce a different behaviour for
>> > > > CONFIG_FS_ENCRYPTION and !CONFIG_FS_ENCRYPTION.
>> > > > - Rephrased helper kerneldoc
>> > > >
>> > > > Changes since initial RFC (after Eric's review):
>> > > > - Added kerneldoc comments to the new fscrypt helper
>> > > > - Dropped '__' from helper name (now fscrypt_prepare_atomic_open())
>> > > > - Added IS_ENCRYPTED() check in helper
>> > > > - DCACHE_NOKEY_NAME is not set if fscrypt_get_encryption_info() returns an
>> > > > error
>> > > > - Fixed helper for !CONFIG_FS_ENCRYPTION (now defined 'static inline')
>> > > This series looks good to me.
>> > >
>> > > And I have run the test locally and worked well.
>> > >
>> > >
>> > > > Luís Henriques (3):
>> > > > fscrypt: new helper function - fscrypt_prepare_lookup_partial()
>> > > Eric,
>> > >
>> > > If possible I we can pick this together to ceph repo and need your ack
>> > > about this. Or you can pick it to the crypto repo then please feel free
>> > > to add:
>> > >
>> > > Tested-by: Xiubo Li <xiubli@xxxxxxxxxx> and Reviewed-by: Xiubo Li
>> > > <xiubli@xxxxxxxxxx>
>> > I would prefer the fscrypt helper to go through the fscrypt tree.
>>
>> Sure. This also LGTM.
>>
>> Thanks
>>
>
> I've applied it to
> https://git.kernel.org/pub/scm/fs/fscrypt/linux.git/log/?h=for-next
>
> But I ended up reworking the comment a bit and moving the function to be just
> below __fscrypt_prepare_lookup(). So I sent out v4 that matches what I applied.

Awesome, thanks a lot, Eric.

> BTW, I'm wondering if anyone has had any thoughts about the race condition I
> described at https://lore.kernel.org/r/ZBC1P4Gn6eAKD61+@sol.localdomain/. In
> particular, I'm wondering whether this helper function will need to be changed
> or not. Maybe not, because ceph could look at DCACHE_NOKEY_NAME to determine
> whether the name should be treated as a no-key name or not, instead of checking
> fscrypt_has_encryption_key() again (as I think it is doing currently)?

I started looking into that but, to be honest, I haven't yet reached any
conclusion. It looks like the ceph code that handles filenames *may* have
this race too (I'm looking at ceph_fill_trace()) but I'm still not 100%
sure. In any case, I think that an eventual fix for this race (if it does
indeed exist!) will likely be restricted to the ceph code and won't touch
the generic fscrypt code. But I'm still looking...

Cheers,
--
Luís