Re: [RFC PATCH v2 0/4] ceph: safely use 'copy-from' Op on Octopus OSDs

From: Sage Weil
Date: Thu Nov 14 2019 - 09:18:05 EST


On Thu, 14 Nov 2019, Ilya Dryomov wrote:
> > > I'm just getting caught up on the discussion here, but why was it
> > > decided to do it this way instead of just adding a new OSD
> > > "copy-from-no-truncseq" operation? Once you tried it once and an OSD
> > > didn't support it, you could just give up on using it any longer? That
> > > seems a lot simpler than trying to monkey with feature bits.
> >
> > I don't remember the original discussion either, but in retrospect that
> > does seem much simpler--especially since hte client is conditioning
> > sending this based on the the require_osd_release. It seems like passing
> > a flag to the copy-from op would be more reasonable instead of conditional
> > feature-based behavior.
>
> Yeah, I suggested adding require_osd_release to the client portion just
> because we are running into it more and more: Objecter relies on it for
> RESEND_ON_SPLIT for example. It needs to be accessible so that patches
> like that can be carried over to the kernel client without workarounds.
>
> copy-from in its existing form is another example. AFAIU the problem
> is that copy-from op doesn't reject unknown flags. Luis added a flag
> in https://github.com/ceph/ceph/pull/25374, but it is simply ignored on
> nautilus and older releases, potentially leading to data corruption.
>
> Adding a new op that would be an alias for CEPH_OSD_OP_COPY_FROM with
> CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ like Jeff is suggesting, or a new
> copy-from2 op that would behave just like copy-from, but reject unknown
> flags to avoid similar compatibility issues in the future is probably
> the best thing we can do from the client perspective.

Yeah, I think copy-from2 is the best path. I think that means we should
revert what we merged to ceph.git a few weeks back, Luis! Sorry we didn't
put all the pieces together before...

sage