Re: [PATCH 2/2] Documentation/process: Add a maintainer handbook for KVM x86

From: Sean Christopherson
Date: Tue Feb 21 2023 - 20:55:18 EST


On Mon, Feb 20, 2023, Like Xu wrote:
> On 18/2/2023 6:54 am, Sean Christopherson wrote:
> > Add a KVM x86 doc to the subsystem/maintainer handbook section to explain
> > how KVM x86 (currently) operates as a sub-subsystem, and to soapbox on
> > the rules and expectations for contributing to KVM x86.
> >
> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > ---
> > .../process/maintainer-handbooks.rst | 1 +
> > Documentation/process/maintainer-kvm-x86.rst | 347 ++++++++++++++++++
> > MAINTAINERS | 1 +
> > 3 files changed, 349 insertions(+)
> > create mode 100644 Documentation/process/maintainer-kvm-x86.rst
> >
> > diff --git a/Documentation/process/maintainer-handbooks.rst b/Documentation/process/maintainer-handbooks.rst
> > index d783060b4cc6..d12cbbe2b7df 100644
> > --- a/Documentation/process/maintainer-handbooks.rst
> > +++ b/Documentation/process/maintainer-handbooks.rst
> > @@ -17,3 +17,4 @@ Contents:
> > maintainer-tip
> > maintainer-netdev
> > + maintainer-kvm-x86
> > diff --git a/Documentation/process/maintainer-kvm-x86.rst b/Documentation/process/maintainer-kvm-x86.rst
> > new file mode 100644
> > index 000000000000..ac81a42a32a3
> > --- /dev/null
> > +++ b/Documentation/process/maintainer-kvm-x86.rst
> > @@ -0,0 +1,347 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +KVM x86
>
> KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86)

My preference is "KVM x86", but I don't have a super strong opinion. I definitely
don't think it's necessary to strictly match the MAINTAINERS file.

> > +Lifecycle
> > +~~~~~~~~~
> > +Pull requests for the next release cycle are sent to the main KVM tree, one
> > +for each KVM x86 topic branch. If all goes well, the topic branches are rolled
> > +into the main KVM pull request sent during Linus' merge window. Pull requests
> > +for KVM x86 branches are typically made the week before Linus' opening of the
> > +merge window, e.g. the week following rc7 for "normal" releases.
> > +
> > +The KVM x86 tree doesn't have its own official merge window, but there's a soft
> > +close around rc5 for new features, and a soft close around rc6 for fixes.
>
> Any urgent AND critical fixes are exempt. No?

More or less. The majority of urgent and critical fixes should target the current
release, and the "soft" qualifier on "soft close" covers the rest. Ah, but this
section doesn't say anything about fixes that target mainline. I'll add a paragraph
to explicitly talk about fixes.

> > +SDM and APM References
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +Much of KVM's code base is directly tied to architectural behavior defined in
> > +Intel's Software Development Manual (SDM) and AMD's Architecture Programmer’s
> > +Manual (APM). Use of "Intel's SDM" and "AMD's APM", or even just "SDM" or
> > +"APM", without additional context is a-ok.
>
> ISE: Intel® Architecture Instruction Set Extensions and Future Features
> PPR: Processor Programming Reference (PPR) for specific AMD Model

Hmm, I think we should explicitly spell those out in changelogs. ISE releases
should _never_ be referenced in code, and I would prefer

> > +
> > +Do not reference specific sections, tables, figures, etc. by number, especially
> > +not in comments. Instead, copy-paste the relevant snippet (if warranted), and
> > +reference sections/tables/figures by name. The layouts of the SDM and APM are
> > +constantly changing, and so the numbers/labels aren't stable/consistent.
> > +
> > +Generally speaking, do not copy-paste SDM or APM snippets into comments. With
> > +few exceptions, KVM *must* honor architectural behavior, therefore it's implied
> > +that KVM behavior is emulating SDM and/or APM behavior.
>
> All undefined behaviors (if any) need to be clarified.
>
> > +
> > +Shortlog
> > +~~~~~~~~
> > +The preferred prefix format is ``KVM: <topic>:``, where ``<topic>`` is one of::
> > +
> > + - x86
> > + - x86/mmu
> > + - x86/pmu
> > + - x86/xen
>
> Any conflict w/ "KVM X86 Xen (KVM/Xen)" ? Then "KVM X86 HYPER-V (KVM/hyper-v)" ?

I didn't follow this question. Are you asking if we should have "KVM: x86/hyperv:"?
If so, my answer is "probably not". The Hyper-V code isn't as isolated as the Xen
code and much of it is vendor specific, e.g. I would rather Hyper-V changes that
primarily touch vmx.c be tagged "KVM: VMX" as it's possible, however unlikely, that
non-Hyper-V users could be affected.

> > + - selftests > + - SVM
> > + - nSVM
> > + - VMX
> > + - nVMX
>
> emulator ? lapic ?

I don't want to introduce "emulator" as I don't think there will be a high enough
volume of patches to benefit from the extra qualifier, the emulator isn't as isolated
as we like to think it is, and because KVM does a lot of emulation outside of the
emulator. E.g. enter_smm() doesn't flow through emulate.c, but emulator_leave_smm()
_only_ gets invoked via the emulator.

Similar to Hyper-V, I'm hesitant to introduce x86/apic as many "APIC" changes are
actually in vendor specific code, e.g. much of APICv, AVIC, IPI virtualization, etc.
I'm definitely not opposed to using x86/apic for local and and maybe even I/O APIC
changes though.

> > +**DO NOT use x86/kvm!** ``x86/kvm`` is used exclusively for Linux-as-a-KVM-guest
> > +changes, i.e. for arch/x86/kernel/kvm.c.
> > +
> > +Note, these don't align with the topics branches (the topic branches care much
> > +more about code conflicts).
> > +
> > +All names are case sensitive! ``KVM: x86:`` is good, ``kvm: vmx:`` is not.
> > +
> > +Capitalize the first word of the condensed patch description, but omit ending
> > +punctionation. E.g.::
>
> s/punctionation/punctuation
>
> > +
> > + KVM: x86: Fix a null pointer dererence in function_xyz()
>
> s/dererence/dereference

Heh, at least I'm consistent.

> > + kvm: x86: fix a null pointer dererence in function_xyz.
> > +
> > +If a patch touches multiple topics, traverse up the conceptual tree to find the
> > +first common parent (which is often simply ``x86``). When in doubt,
> > +``git log path/to/file`` should provide a reasonable hint.
> > +
> > +New topics do occasionally pop up, but please start an on-list discussion if
> > +you want to propose introducing a new topic, i.e. don't go rogue.
> > +
> > +Do not use file names or complete file paths as the subject/shortlog prefix.
> > +
> > +Changelog
> > +~~~~~~~~~
> > +Write changelogs using imperative mood and avoid pronouns. Lead with a short
> > +blurb on what is changing, and then follow up with the context and background.
> > +Note! This order directly conflicts with the tip tree's preferred approach!
>
> Emm, could this be considered/clarified as an incentive option ?

Can you elaborate? I don't understand what you're asking.

> > +Testing
> > +-------
> > +At a bare minimum, *all* patches in a series must build cleanly for KVM_INTEL=m
> > +KVM_AMD=m, and KVM_WERROR=y. Building every possible combination of Kconfigs
> > +isn't feasible, but the more the merrier. KVM_SMM, KVM_XEN, PROVE_LOCKING, and
> > +X86_64 are particularly interesting knobs to turn.
> > +
> > +Running KVM selftests and KVM-unit-tests is also mandatory (and stating the
> > +obvious, the tests need to pass). When possible and relevant, testing on both
> > +Intel and AMD is strongly preferred. Booting an actual VM is encouraged, but
> > +not mandatory.
>
> Testing L2 guest available features inside L1 is also encouraged.

Hmm. Sort of. In a perfect world where everyone has unlimited time, then testing
a inside a nested VM would be encouraged. But practically speaking, doing basic
testing of a feature in a nested VM doesn't buy all that much meaningful coverage,
and can even give a false sense of security.

The things we tend to get wrong are the edge cases, error handling, etc., and those
are unlikely to be excercised by basic testing, i.e. really need dedicated tests.
And if we have dedicated tests, then nested is covered by "Running selftests and KUT
are mandatory".

At least for official documentation, I think I would prefer not to explicitly
encourage people to use their time booting and testing in nested VMs.

> > +For changes that touch KVM's shadow paging code, running with TDP (EPT/NPT)
> > +disabled is mandatory. For changes that affect common KVM MMU code, running
> > +with TDP disabled is strongly encouraged. For all other changes, if the code
> > +being modified depends on and/or interacts with a module param, testing with
> > +the relevant settings is mandatory.
> > +
> > +Note, KVM selftests and KVM-unit-tests do have known failures. If you suspect
> > +a failure is not due to your changes, verify that the *exact same* failure
> > +occurs with and without your changes.
> > +
> > +If you can't fully test a change, e.g. due to lack of hardware, clearly state
> > +what level of testing you were able to do, e.g. in the cover letter.
>
> Add an RFT (request for test) tag.

I'm not necessarily opposed to the idea of "RFT", but until it's a ubiquitous
and/or formally documented tag I would prefer to avoid using RFT KVM x86.

For the overwhelming majority of changes where "I couldn't test this" is legitimate
_and_ the series/patch isn't tagged RFC, I have access to the necessary hardware.
For the few cases where I don't have hardware, I don't mind hunting down someone
who does. I.e. blasting "RFT to everyone isn't necessary.

In other words, I think we'll spend more time explaining and trying to understand
the use of RFT than we would benefit from using it to identify when a series needs
extra testing.

> > +New Features
> > +~~~~~~~~~~~~
> > +With one exception, new features *must* come with test coverage. KVM specific
> > +tests aren't strictly required, e.g. if coverage is provided by running a
> > +sufficiently enabled guest VM, or by running a related kernel selftest in a VM,
> > +but dedicated KVM tests are preferred in all cases. Negative testcases in
> > +particular are mandatory for enabling of new hardware features as error and
> > +exception flows are rarely exercised simply by running a VM.
> > +
> > +The only exception to this rule is if KVM is simply advertising support for a
> > +feature via KVM_SET_SUPPORTED_CPUID, i.e. for instructions/features that KVM
> > +can't prevent a guest from using and for which there is no true enabling.
> > +
> > +Note, "new features" does not just mean "new hardware features"! New features
> > +that can't be well validated using existing KVM selftests and/or KVM-unit-tests
> > +must come with tests.
>
> must come with tests to ensure good code coverage.

I _really_ don't want to add that qualifier. Code coverage is but one aspect of
testing, and it's not even #1 priority, e.g. exercising code that violates architectural
behavior means nothing if the test doesn't detect the error. I.e. the purpose of
tests isn't _just_ to ensure good code coverage.

> > +Posting new feature development without tests to get early feedback is more
> > +than welcome, but such submissions should be tagged RFC, and the cover letter
> > +should clearly state what type of feedback is requested/expected. Do not abuse
> > +the RFC process; RFCs will typically not receive in-depth review.
> > +
> > +Bug Fixes
> > +~~~~~~~~~
> > +Except for "obvious" found-by-inspection bugs, fixes must be accompanied by a
> > +reproducer for the bug being fixed. In many cases the reproducer is implicit,
> > +e.g. for build errors and test failures, but it should still be clear to
> > +readers what is broken and how to verify the fix. Some leeway is given for
> > +bugs that are found via non-public workloads/tests, but providing regression
> > +tests for such bugs is strongly preferred.
>
> tests or detailed reproduction sequence for such bugs is strongly preferred.

No, just tests. This is specifically talking about non-public workloads. If a
detailed reproduction sequence _doesn't_ involve writing code, i.e. a test, then
the fix should simply provider the reproducer.

If you're talking about the sequence of events in KVM that result in the bug,
then that info belongs in the changelog and is largely covered by the tip tree
handbook.

> > +Git Base
> > +~~~~~~~~
> > +If you are using git version 2.9.0 or later (Googlers, this is all of you!),
>
> Please do not mention specific developers or groups in this type of document.

Honest question, why not?

Calling out a specific developer is one thing, but I don't see the harm in adding
what essentially amounts to an extra requirement for Google employees.

> > +use ``git format-patch`` with the ``--base`` flag to automatically include the
> > +base tree information in the generated patches.
> > +
> > +Note, ``--base=auto`` works as expected if and only if a branch's upstream is
> > +set to the base topic branch, e.g. it will do the wrong thing if your upstream
> > +is set to your personal repository for backup purposes. An alternative "auto"
> > +solution is to derive the names of your development branches based on their
> > +KVM x86 topic, and feed that into ``--base``. E.g. ``x86/pmu/my_branch_name``,
> > +and then write a small wrapper to extract ``pmu`` from the current branch name
> > +to yield ``--base=x/pmu``, where ``x`` is whatever name your repository uses to
> > +track the KVM x86 remote.
> > +
> > +Co-Posting Tests
> > +~~~~~~~~~~~~~~~~
> > +KVM selftests that are associated with KVM changes, e.g. regression tests for
> > +bug fixes, should be posted along with the KVM changes as a single series.
>
> Keeping git-bisect is mandatory.

Ah, yeah, I'll explicitly call out the order between tests and KVM.

> > +Vulnerabilities
> > +---------------
> > +Bugs that can be exploited by the guest to attack the host (kernel or
> > +userspace), or that can be exploited by a nested VM to *its* host (L2 attacking
> > +L1), are of particular interest to KVM. Please follow the protocol for
>
> L1, even L0 host),

No, because at that point it's just the guest attacking the host. This snippet
is purely translating "nested VM to *its* host* into familiar terminology.