Re: [PATCH 3/6] composefs: Add descriptor parsing code

From: Alexander Larsson
Date: Tue Jan 10 2023 - 11:36:01 EST


On Thu, 2023-01-05 at 15:02 -0500, Brian Masney wrote:
> On Mon, Nov 28, 2022 at 12:16:59PM +0100, Alexander Larsson wrote:
> > This adds the code to load and decode the filesystem descriptor
> > file
> > format.
> >
> >
> > +#define CFS_N_PRELOAD_DIR_CHUNKS 4
>
> From looking through the code it appears that this is actually the
> maximum number of chunks. Should this be renamed from PRELOAD to MAX?
>

No, this is the number of directory chunks we statically pre-load while
loading the inode. If there are more (i.e. for larger directories) we
load chunks dynamically as needed during the directory operations.

>
> > +       header->magic = cfs_u32_from_file(header->magic);
> > +       header->data_offset = cfs_u64_from_file(header-
> > >data_offset);
> > +       header->root_inode = cfs_u64_from_file(header->root_inode);
>
> Should the cpu to little endian conversion occur in cfs_read_data()?

cfs_read_data() reads just a block of opaque data. It can't know which
parts make up individual values to convert.

> > +
> > +       if (header->magic != CFS_MAGIC ||
> > +           header->data_offset > ctx.descriptor_len ||
> > +           sizeof(struct cfs_header_s) + header->root_inode >
> > +                   ctx.descriptor_len) {
> > +               res = -EINVAL;
>
> Should this be -EFSCORRUPTED?
>

I don't think so. I can see the argument for it, but at this point we
just looked at a file based on a mount argument, and it seems
completely wrong (not just corrupt in the middle). Reporting EINVAL in
the mount syscall for "invalid argument" seems more right to me.


> >
> > +static void *cfs_get_vdata_buf(struct cfs_context_s *ctx, u64
> > offset, u32 len,
> > +                              struct cfs_buf *buf)
> > +{
> > +       if (offset > ctx->descriptor_len - ctx->header.data_offset)
> > +               return ERR_PTR(-EINVAL);
> > +
> > +       if (len > ctx->descriptor_len - ctx->header.data_offset -
> > offset)
> > +               return ERR_PTR(-EINVAL);
>
> It appears that these same checks are already done in cfs_get_buf().
>

Not quite. It is true that we check in cfs_get_buf() that the arguments
are not completely outside the file. However, this part checks that the
offset is specifically within the vdata part of the file. In
particular, if we rely on the checks in cfs_get_buf() we could get
confused if the below data_offset + offset wrapped.

> > +
> > +       return cfs_get_buf(ctx, ctx->header.data_offset + offset,
> > len, buf);
> > +}


>
>
> > +       ino->flags = cfs_read_u32(&data);
> > +
> > +       inode_size = cfs_inode_encoded_size(ino->flags);
>
> Should CFS_INODE_FLAGS_DIGEST_FROM_PAYLOAD also be accounted for in
> cfs_inode_encoded_size()?

No, that flag just means that the already existing payload (which
contains the pathname for the backing data) should be used as the
source for the digest (to avoid storing it twice). It doesn't take up
any extra space otherwise.

> Also, cfs_inode_encoded_size() is only used here so can be brought
> into this file.
>

I see this as sort of part of the filesystem on-disk format, so I
rather have it in the cfs.h header with the rest of the fs definition.

>
> > +static bool cfs_validate_filename(const char *name, size_t
> > name_len)
> > +{
> > +       if (name_len == 0)
> > +               return false;
> > +
> > +       if (name_len == 1 && name[0] == '.')
> > +               return false;
> > +
> > +       if (name_len == 2 && name[0] == '.' && name[1] == '.')
>
> Can strcmp() be used here?
>

name is not zero-terminated, so I don't think so. At least not in any
natural way.

> > +       inode_data->has_digest = ret != 0;
>
> Can you do 'has_digest = inode_data->digest != NULL;' to get rid of
> the need for return 1 in cfs_get_digest().

No, because ->digest is an in-line array, not a pointer.

> > +static inline int memcmp2(const void *a, const size_t a_size,
> > const void *b,
> > +                         size_t b_size)
> > +{
> > +       size_t common_size = min(a_size, b_size);
> > +       int res;
> > +
> > +       res = memcmp(a, b, common_size);
> > +       if (res != 0 || a_size == b_size)
> > +               return res;
> > +
> > +       return a_size < b_size ? -1 : 1;
>
> This function appears to be used only in one place below. It doesn't
> seem like it matters for the common_size. Can this just be dropped
> and use memcmp()?

Not sure what you mean by this. This is used as a sort/order function,
not just as an "is-the-same" check, so we have to report e.g. that
"prefixXYZ" is after "prefix". In other words, this is essentially
strcmp(), but the strings are not zero terminated.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=-=
Alexander Larsson Red Hat,
Inc
alexl@xxxxxxxxxx alexander.larsson@xxxxxxxxx
He's a deeply religious Amish hairdresser who hides his scarred face
behind a mask. She's a mentally unstable gypsy bodyguard in the witness
protection program. They fight crime!