Re: [PATCH v4 00/14] Add kdbus implementation

From: David Herrmann
Date: Wed Mar 25 2015 - 13:29:27 EST


Hi

On Tue, Mar 24, 2015 at 12:24 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Mon, Mar 23, 2015 at 8:28 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
>> On Thu, Mar 19, 2015 at 4:48 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>> On Thu, Mar 19, 2015 at 4:26 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
>>>> metadata handling is local to the connection that sends the message.
>>>> It does not affect the overall performance of other bus operations in
>>>> parallel.
>>>
>>> Sure it does if it writes to shared cachelines. Given that you're
>>> incrementing refcounts, I'm reasonable sure that you're touching lots
>>> of shared cachelines.
>>
>> Ok, sure, but it's still mostly local to the sending task. We take
>> locks and ref-counts on the task-struct and mm, which is for most
>> parts local to the CPU the task runs on. But this is inherent to
>> accessing this kind of data, which is the fundamental difference in
>> our views here, as seen below..
>
> You're also refcounting the struct cred

No?

We do ref-count the group-info, but that is actually redundant as we
just copy the IDs. We should drop this, since group-info of 'current'
can be accessed right away. I noted it down.

> and there's no good reason
> for that to be local. (It might be a bit more local than intended
> because of the absurd things that the key subsystem does to struct
> cred, but IMO users should turn that off or the kernel should fix it.)
>
> Even more globally, I think you're touching init_user_ns's refcount in
> most scenarios. That's about as global as it gets.

get_user_ns() in metadata.c is a workaround (as the comment there
explains). With better export-helpers for caps, we can simply drop it.
It's conditional on KDBUS_ATTACH_CAPS, anyway.

> (Also, is there an easy benchmark to see how much time it takes to
> send and receive metadata? I tried to get the kdbus test to do this,
> and I failed. I probably did it wrong.)

patch for out-of-tree kdbus:
https://gist.github.com/dvdhrm/3ac4339bf94fadc13b98

Update it to pass _KDBUS_ATTACH_ALL for both arguments of
kdbus_conn_update_attach_flags().

>>>> Furthermore, it's way faster than collecting the "same" data
>>>> via /proc, so I don't think it slows down the overall transaction at
>>>> all. If a receiver doesn't want metadata, it should not request it (by
>>>> setting the receiver-metadata-mask). If a sender doesn't like the
>>>> overhead, it should not send the metadata (by setting the
>>>> sender-metadata-mask). Only if both peers set the metadata mask, it
>>>> will be transmitted.
>>>
>>> But you're comparing to the wrong thing, IMO. Of course it's much
>>> faster than /proc hackery, but it's probably much slower to do the
>>> metadata operation once per message than to do it when you connect to
>>> the endpoint. (Gah! It's a "bus" that could easily have tons of
>>> users but a single "endpoint". I'm still not used to it.)
>>
>> Yes, of course your assumption is right if you compare against
>> per-connection caches, instead of per-message metadata. But we do
>> support _both_ use-cases, so we don't impose any policy.
>> We still believe "live"-metadata is a crucial feature of kdbus,
>> despite the known performance penalties.
[...]
> This is even more true if this feature
> is *inconsistent* with legacy userspace (i.e. userspace dbus).

Live metadata is already supported on UDS via SCM_CREDENTIALS, we just
extend it to other metadata items. It's not a new invention by us.
Debian code-search on SO_PASSCRED and SCM_CREDENTIALS gives lots of
results.

Netlink, as a major example of an existing bus API, already uses
SCM_CREDENTIALS as primary way to transmit metadata.

> I could be wrong about the lack of use cases. If so, please enlighten me.

We have several dbus APIs that allow clients to register as a special
handler/controller/etc. (eg., see systemd-logind TakeControl()). The
API provider checks the privileges of a client on registration and
then just tracks the client ID. This way, the client can be privileged
when asking for special access, then drop privileges and still use the
interface. You cannot re-connect in between, as the API provider
tracks your bus ID. Without message-metadata, all your (other) calls
on this bus would always be treated as privileged. We *really* want to
avoid this.

Another example is logging, where we want exact data at the time a
message is logged. Otherwise, the data is useless. With
message-metadata, you can figure out the exact situation a process was
in when a specific message was logged. Furthermore, it is impossible
to read such data from /proc, as the process might already be dead.
Which is a _real_ problem right now!
Similarly, system monitoring wants message-metadata for the same
reasons. And it needs to be reliable, you don't want malicious
sandboxes to mess with your logs.

kdbus is a _bus_, not a p2p channel. Thus, a peer may talk to multiple
destinations, and it may want to look different to each of them. DBus
method-calls allow 'syscall'-ish behavior when calling into other
processes. We *want* to be able to drop privileges after doing process
setup. We want further bus-calls to no longer be treated privileged.

Furthermore, DBus was designed to allow peers to track other peers
(which is why it always had the NameOwnerChanged signal). This is an
essential feature, that simplifies access-management considerably, as
you can cache it together with the unique name of a peer. We only open
a single connection to a bus. glib, libdbus, efl, ell, qt, sd-bus, and
others use cached bus-connections that are shared by all code of a
single thread. Hence, the bus connection is kinda part of the process
itself, like stdin/stdout. Without message-metadata, it is impossible
to ever drop privileges on a bus, without losing all state.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/