Re: [PATCH v8 4/6] rust: debugfs: Support arbitrary owned backing for File

From: Matthew Maurer
Date: Tue Jul 01 2025 - 14:12:07 EST


On Tue, Jul 1, 2025 at 8:10 AM Danilo Krummrich <dakr@xxxxxxxxxx> wrote:
>
> On Tue, Jul 01, 2025 at 04:21:56PM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Jul 01, 2025 at 04:13:28PM +0200, Danilo Krummrich wrote:
> > > On Tue, Jul 01, 2025 at 03:58:45PM +0200, Greg Kroah-Hartman wrote:
> > > > On Mon, Jun 30, 2025 at 08:16:55PM +0200, Danilo Krummrich wrote:
> > > > > On Mon, Jun 30, 2025 at 10:49:51AM -0700, Matthew Maurer wrote:
> > > > > > On Mon, Jun 30, 2025 at 10:39 AM Danilo Krummrich <dakr@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On 6/30/25 7:34 PM, Matthew Maurer wrote:
> > > > > > > > On Mon, Jun 30, 2025 at 10:30 AM Danilo Krummrich <dakr@xxxxxxxxxx> wrote:
> > > > > > > >>
> > > > > > > >> On 6/28/25 1:18 AM, Matthew Maurer wrote:
> > > > > > > >>> + fn create_file<D: ForeignOwnable>(&self, _name: &CStr, data: D) -> File
> > > > > > > >>> + where
> > > > > > > >>> + for<'a> D::Borrowed<'a>: Display,
> > > > > > > >>> + {
> > > > > > > >>> + File {
> > > > > > > >>> + _foreign: ForeignHolder::new(data),
> > > > > > > >>> + }
> > > > > > > >>> }
> > > > > > > >>
> > > > > > > >> What's the motivation for the ForeignHolder abstraction? Why not just make it
> > > > > > > >> File<D> and store data directly?
> > > > > > > >
> > > > > > > > 1. A `File<D>` can't be held in collection data structures as easily
> > > > > > > > unless all your files contain the *same* backing type.
> > > > > > >
> > > > > > > That sounds reasonable.
> > > > > > >
> > > > > > > > 2. None of the APIs or potential APIs for `File` care about which type
> > > > > > > > it's wrapping, nor are they supposed to. If nothing you can do with a
> > > > > > > > `File` is different depending on the backing type, making it
> > > > > > > > polymorphic is just needlessly confusing.
> > > > > > >
> > > > > > > What if I want to access file.data() and do something with the data? Then I'd
> > > > > > > necessarily need to put my data in an Arc and reference count it to still be
> > > > > > > able to access it.
> > > > > > >
> > > > > > > That doesn't seem like a reasonable requirement to be able to access data
> > > > > > > exposed via debugfs.
> > > > > >
> > > > > > `pub fn data(&self) -> D` would go against my understanding of Greg's
> > > > > > request for DebugFS files to not really support anything other than
> > > > > > delete. I was even considering making `D` not be retained in the
> > > > > > disabled debugfs case, but left it in for now for so that the
> > > > > > lifecycles wouldn't change.
> > > > >
> > > > > Well, that's because the C side does not have anything else. But the C side has
> > > > > no type system that deals with ownership:
> > > > >
> > > > > In C you just stuff a pointer of your private data into debugfs_create_file()
> > > > > without any implication of ownership. debugfs has a pointer, the driver has a
> > > > > pointer. The question of the ownership semantics is not answered by the API, but
> > > > > by the implementation of the driver.
> > > > >
> > > > > The Rust API is different, and it's even implied by the name of the trait you
> > > > > expect the data to implement: ForeignOwnable.
> > > > >
> > > > > The File *owns* the data, either entirely or a reference count of the data.
> > > > >
> > > > > If the *only* way to access the data the File now owns is by making it reference
> > > > > counted, it:
> > > > >
> > > > > 1) Is additional overhead imposed on users.
> > > > >
> > > > > 2) It has implications on the ownership design of your driver. Once something
> > > > > is reference counted, you loose the guarantee the something can't out-live
> > > > > some event.
> > > > >
> > > > > I don't want that people have to stuff their data structures into Arc (i.e.
> > > > > reference count them), even though that's not necessary. It makes it easy to
> > > > > make mistakes. Things like:
> > > > >
> > > > > let foo = bar.clone();
> > > > >
> > > > > can easily be missed in reviews, whereas some contributor falsely changing a
> > > > > KBox to an Arc is much harder to miss.
> > > > >
> > > > > > If you want a `.data()` function, I can add it in,
> > > > >
> > > > > I think it could even be an implementation of Deref.
> > > > >
> > > > > > but I don't think
> > > > > > it'll improve flexibility in most cases. If you want to do something
> > > > > > with the data and it's not in an `Arc` / behind a handle of some kind,
> > > > > > you'll need something providing threadsafe interior mutability in the
> > > > > > data structure. If that's a lock, then I have a hard time believing
> > > > > > that `Arc<Mutex<T>>`(or if it's a global, a `&'static Mutex<T>`, which
> > > > > > is why I added that in the stack) is so much more expensive than
> > > > > > `Box<Mutex<T>>` that it's worth a more complex API. If it's an atomic,
> > > > > > e.g. `Arc<AtomicU8>`, then I can see the benefit to having
> > > > > > `Box<AtomicU8>` over that, but it still seems so slim that I think the
> > > > > > simpler "`File` is just a handle to how long the file stays alive, it
> > > > > > doesn't let you do anything else" API makes sense.
> > > > >
> > > > > I don't really see what is complicated about File<T> -- it's a File and it owns
> > > > > data of type T that is exposed via debugfs. Seems pretty straight forward to me.
> > > > >
> > > > > Maybe the performance cost is not a huge argument here, but maintainability in
> > > > > terms of clarity about ownership and lifetime of an object as explained above
> > > > > clearly is.

The part that is not straightforward is that the primary purpose of
`T` may not be "to be exposed via DebugFS", and leaves us in the
unusual world where DebugFS (or at least our bindings to it) are
load-bearing in builds that have DebugFS disabled. It can work, but it
definitely seems confusing and is privileging DebugFS holding a
reference to a data structure over other things holding a reference to
it.

> > > >
> > > > I'm agreeing here. As one of the primary users of this api is going to
> > > > be a "soc info" module, like drivers/soc/qcom/socinfo.c, I tried to make
> > > > an example driver to emulate that file with just a local structure, but
> > > > the reference counting and access logic just didn't seem to work out
> > > > properly. Odds are I'm doing something stupid though...

I have a version of this that works with my initial driver, and keep
intending to revise it to work on top of the new one, but I keep
getting requests for API changes before I get around to reworking it
;)

The way I intended my `ForeignOwnable` variant of this to work was
that I would have a *single* struct that contained all the scanned
data. For the socinfo module, we know how big this could be, so we
could put this as a `static` in the module, in which case we'd have
`&'static MyInfo`. If we didn't want to do that or didn't know, we'd
produce an `Arc<MyInfo>`. Then, for each file, we'd create a file that
passed in either `my_arc_info.clone()` or `my_info` if we had the
static reference, with the functions that do the printing
distinguishing the behavior of each file.

Doing the file-owns-the-specific-field version *could* work, but means
that the debugfs tree code would be interleaved with the calls to
parsing code, or that we'd need two copies of the struct, an initial
one with all the parsed data, and a follow-up one that has has them
all wrapped in `File`.

> > >
> > > I think it technically works, but it imposes semantics on drivers that we
> > > shouldn't do; see the example below.
> > >
> > > > So a file callback IS going to want to have access to the data of type T
> > > > that is exposed somehow.
> > >
> > > With the current API we would need this:
> > >
> > > struct GPU {
> > > fw: Arc<Firmware>,
> > > fw_file: debugfs::File,
> > > }
> > >
> > > and then I would initialize it the following way:
> > >
> > > let fw = Arc::new(Firmware::new(), GFP_KERNEL)?;
> > > let fw_file = dir.create_file("firmware", fw.clone());
> > >
> > > fw.do_something();
> > >
> > > This is bad, because now my Firmware instance in GPU needs to be reference
> > > counted, even though it should *never* out-live the GPU instance. This is error
> > > prone.

If `Firmware` outliving the GPU instance causes a bug, not just a
resource leak, I'd argue that you should be passing a narrower handle
to your debugfs file. If it just causes a resource leak, then I'd
argue that if you leak a resource (the debugfs file itself), then
leaking a resource is not unexpected.

For the narrower handle example, consider e.g.
```
struct Firmware {
info: Arc<FirmwareInfo>,
my_handle: Handle // Some handle whose drop is load bearing
}
// ...
let fw: Firmware = build_fw()?;
let fw_file = fw.info.clone();
fw.do_something();
```

> >
> > Agreed, AND you just created a new fw structure that you really didn't
> > need, wasting memory.
>
> In case you refer to fw.clone(), since fw is an Arc<Firmware>, clone() just
> creates a new reference count to the same object.
>
> > > Instead this should just be:
> > >
> > > struct GPU {
> > > fw: debugfs::File<Firmware>,
> > > }
> > >
> > > and then I would initialize it the following way:
> > >
> > > let fw = KBox::new(Firmware::new(), GFP_KERNEL)?;
> > > let file = dir.create_file("firmware", fw);
> > >
> > > // debugfs::File<Firmware> dereferences to Firmware
> > > file.do_something();
> > >
> > > // Access to fw is prevented by the compiler, since it has been moved
> > > // into file.
> > >
> > > This is much better, since now I have the guarantee that my Firmare instance
> > > can't out-live the GPU instance.
> >
> > That's better, yes, but how would multiple files for the same
> > "structure" work here? Like a debugfs-file-per-field of a structure
> > that we often have?

This seems odd to me because with this model the DebugFS file itself
becomes deeply involved with the ownership of the devices or other
data. There are a few issues I think this creates:
1. It discourages file-per-field, because you can no longer have the
DebugFS file uniquely own something, which you're now trying to
encourage. This seems likely to result in people having debugfs files
that render out structured data, which I think we generally don't
want?
2. If I have a module and want to add a new DebugFS file to it, the
change will edit the core structures of the drivers rather than just
adding an auxiliary field. This is partially remedied if we make
`File<D>` implement `Deref<Target=D>`. It will still either incur
churn or imprecision around commonly derived traits like equality and
debug. If I have `#[derive(PartialEq)]` on my struct including
`DeviceData`, and then I wrap that into a `File<DeviceData>`, we'll
get stuck with a quandry:
a. Should files `x == y` if the underlying private data is equal?
This has the benefit of preserving the behavior of the derive, but
means that distinct files may compare equal if they have the same
contents.
b. Should files `x == y` only be true if the underlying *file* is
the same? This seems logical for *file* equality, but will break
equality on the field or structures including it.
This will also be an incompatible change for anyone already using an
explicit `.deref()` or an explicit `impl Deref<Target = T>` method to
work with the field, because while deref coercion will automatically
chain, `impl Deref<Target = T>` *cannot* automatically chain.
3. If you create an object specifically for debugging, then it will
*have* to be created and kept around even if DebugFS is disabled,
because we can't tell whether the user is relying on its existence.
(My current implementation keeps it around in that case as well, but
that's not guaranteed or required - we could cause it to just
immediately drop data that someone tried to attach.) This means that
rather than relying on the API to determine whether the information is
needed, the modules will need to check the config if they want to get
rid of it.

>
> That is a very good question and I thought about this as well, because with only
> the current API this would require us to have more and more dynamic allocations
> if we want to have a more fine grained filesystem representations of structures.
>
> The idea I have for this is to use pin-init, which we do in quite some other
> places as well.
>
> I think we can add an additional API like this:
>
> impl Dir {
> pub fn create_file<T>(&self, data: impl PinInit<T>) -> impl PinInit<Self> {
> pin_init!(Self {
> data <- data,
> ...
> })
> }
> }
>
> This allows us to do things like:
>
> #[pin_data]
> struct Firmware {
> #[pin]
> minor: debugfs::File<u32>,
> #[pin]
> major: debugfs::File<u32>,
> #[pin]
> buffer: debugfs::File<[u8]>,

Nit: This can't work, because a `&[u8]` is double-width, and we only
get to send one pointer into the file. We'd need to add a requirement
`T: Sized` if we really wanted to have `File` embed the data.

> }
>
> impl Firmware {
> pub fn new(&dir: debugfs::Dir, buffer: [u8]) -> impl PinInit<Self> {
> pin_init!(Self {
> minor <- dir.create_file("minor", 1),
> major <- dir.create_file("major", 2),
> buffer <- dir.create_file("buffer", buffer),
> })
> }
> }
>
> // This is the only allocation we need.
> let fw = KBox::pin_init(Firmware::new(...), GFP_KERNEL)?;
>
> With this everything is now in a single allocation and since we're using
> pin-init, Dir::create_file() can safely store pointers of the corresponding data
> in debugfs_create_file(), since this structure is guaranteed to be pinned in
> memory.
>
> Actually, we can also implement *only this*, since with this my previous example
> would just become this:

If we implement *only* pinned files, we run into an additional problem
- you can't easily extend a pinned vector. This means that you cannot
have dynamically created devices unless you're willing to put every
new `File` into its own `Box`, because you aren't allowed to move any
of the previously allocated `File`s for a resize.

Where previously you would have had

```
debug_files: Vec<File>
```

you would now have

```
debug_files: Vec<PinBox<File<T>>>
```

If it's easy to use the model you described earlier where the sole
owner of the data is the debugfs file, then this is mostly fine,
because to use the old version, you'd already have it live in a
`ForeignOwnable`, which most likely means its own allocation. However,
if you're already forced into an environment where the objects you're
embedding into a file are reference counted, you now have an extra
allocation required for every `File`. Because the `File` has to be
able to respond to `data()`, we also won't have these `File`s truncate
to zero size in the no debugfs scenario.

This means that if you want a `File` to render an object that is
reference counted, and they're dynamically created and destroyed, you
will now be required to incur an allocation per file, even when
DebugFS is disabled.

>
> struct GPU {
> fw: debugfs::File<Firmware>,
> }
>
> let file = dir.create_file("firmware", Firmware::new());
> let file = KBox::pin_init(file, GFP_KERNEL)?;
>
> // debugfs::File<Firmware> dereferences to Firmware
> file.do_something();
>
> Given that, I think we should change things to use pin-init right away for the
> debugfs::File API.