Re: [RFC 2/2] AI: Add initial set of rules and docs

From: Sasha Levin
Date: Fri Jul 25 2025 - 18:15:11 EST


On Fri, Jul 25, 2025 at 01:53:57PM -0700, Kees Cook wrote:
On Fri, Jul 25, 2025 at 01:53:58PM -0400, Sasha Levin wrote:
Add rules based on our existing documentation.

I'd still like this not in Documentation/, but I obviously defer to Jon.

Require AI to identify itself in the commit message.

Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
Documentation/AI/main.md | 70 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/Documentation/AI/main.md b/Documentation/AI/main.md
index 959ba50568f57..ca59e52f54445 100644
--- a/Documentation/AI/main.md
+++ b/Documentation/AI/main.md
@@ -1,5 +1,71 @@
# Linux Kernel Development AI Instructions

-This is the Linux kernel repository. When working with this codebase, you must follow the following rules:
+This is the Linux kernel repository. When working with this codebase, you must follow the Linux kernel development processes and coding standards.

-- [ TODO ]
+## Essential Documentation References
+
+### Core Development Process
+- **Documentation/process/howto.rst** - Start here! The comprehensive guide on how to become a Linux kernel developer
+- **Documentation/process/development-process.rst** - Detailed information on how the kernel development process works
+- **Documentation/process/submitting-patches.rst** - Essential guide for getting your code into the kernel
+- **Documentation/process/submit-checklist.rst** - Checklist to review before submitting code

Instead of hard-coded paths, I would recommend just discuss the topic
areas it is expected to find and ingest. :) (e.g. redo the "Key
principles" list you have later to be more specific about the topic
areas and adjust the prompting to induce the requirement to find and
read each topic.)

I'm very open to changing these parts. Ideally we can rewrite it in a
way that's easier for the agent to process rather than something that is
more readable to humans.

+
+### Coding Standards and Style
+- **Documentation/process/coding-style.rst** - Linux kernel coding style (MUST READ)
+ - Use tabs (8 characters) for indentation
+ - 80-character line limit preferred
+ - Specific formatting rules for switch statements, functions, etc.
+- **Documentation/process/programming-language.rst** - Language requirements and standards
+
+### What NOT to Do
+- **Documentation/process/deprecated.rst** - Deprecated interfaces and features to avoid
+ - Do not use BUG() or BUG_ON() - use WARN() instead
+ - Avoid deprecated APIs listed in this document
+- **Documentation/process/volatile-considered-harmful.rst** - Why volatile is usually wrong

And the reason I want to avoid such specifics is that even as an example
above, this ends up being hyperspecific. Why summarize the
deprecated.rst? Just say "Find and read the notes on deprecated APIs and
language features"

When we're being explicit with rules, the agent is more likely to not
ignore it (and go "whoops I messed up!" later).

It's a balance we need to find, but I suspect we can fine tune as when
we see how various agents respond to the rules.

+### Patch Submission Process
+- **Documentation/process/5.Posting.rst** - How to post patches properly
+- **Documentation/process/email-clients.rst** - Email client configuration for patches
+- **Documentation/process/applying-patches.rst** - How patches are applied
+
+### Legal and Licensing
+- **Documentation/process/license-rules.rst** - Linux kernel licensing rules
+ - Kernel is GPL-2.0 only with syscall exception
+ - All files must have proper SPDX license identifiers

The only stuff I think should be in this kind of area is a commentary
about how an Agent differs from a human. "You are not a legal entity;
you cannot sign the DCO", which you get into below.

I was thinking that if we explicitly call out the GPL requirement, an
agent will avoid searching online resources and potentially embedding
code that is not licensed under GPL.

+### Specialized Topics
+- **Documentation/process/adding-syscalls.rst** - How to add new system calls
+- **Documentation/process/stable-kernel-rules.rst** - Rules for stable kernel patches
+- **Documentation/process/security-bugs.rst** - Handling security issues
+- **Documentation/process/handling-regressions.rst** - Dealing with regressions
+
+### Maintainer Guidelines
+- **Documentation/process/maintainers.rst** - Working with subsystem maintainers
+- **Documentation/process/maintainer-handbooks.rst** - Subsystem-specific guidelines
+
+## Key Principles
+1. Read and follow the documentation before making changes
+2. Respect the existing code style and conventions
+3. Test thoroughly before submitting
+4. Write clear, descriptive commit messages
+5. Never break userspace (the #1 rule)
+6. Identify yourself as AI in commits (see below)

Everything except #6 is already expected of human devs, so I think just
the last item.

+
+## AI Attribution Requirement
+When creating commits, you MUST identify yourself as an AI assistant by including the following tag in the commit message:
+
+```
+Co-developed-by: $AI_NAME $AI_MODEL $AI_VERSION

If we're going to go with Co-developed-by: here, then I think we need to
explicitly say "do not include an email", and we must update
checkpatch.pl to not yell about the missing S-o-b when it finds a C-d-b.
(Perhaps it can skip the check with there is no email address in the
C-b-d line?)

+```
+
+For example:
+- `Co-developed-by: Claude claude-3-opus-20240229`
+- `Co-developed-by: GitHub-Copilot GPT-4 v1.0.0`
+- `Co-developed-by: Cursor gpt-4-turbo-2024-04-09`
+
+This transparency helps maintainers and reviewers understand that AI was involved in the development process.
+
+### Signed-off-by Restrictions
+AI assistants MUST NOT add a Signed-off-by tag pointing to themselves. The Signed-off-by tag represents a legal certification by a human developer that they have the right to submit the code under the open source license.

Hello trailing whitespace my old friend.

"Unless explicitly told otherwise, Agents must never have trailing
whitespace on any line and all files must have a final newline
character." :)

+
+Only the human user running the AI assistant should add their Signed-off-by tag to commits. The AI's contribution is acknowledged through the Co-developed-by tag as described above.

And can we please not use the term "AI"? I think "Agent" is the better
generic term as it could include other things?

Ack

--
Thanks,
Sasha