Re: [PATCH v3 2/3] misc: bcm-vk: add Broadcom VK driver

From: Scott Branden
Date: Thu Sep 24 2020 - 17:40:19 EST


Hi Greg,

Responses below.  I'll send out the split up patches hopefully today/tomorrow
which may make it a bit easier to understand/comment on.

On 2020-09-23 10:08 p.m., Greg Kroah-Hartman wrote:
> On Wed, Sep 23, 2020 at 09:43:55PM -0700, Scott Branden wrote:
>>>> +struct bcm_vk_tty {
>>>> + struct tty_port port;
>>>> + uint32_t to_offset; /* bar offset to use */
>>>> + uint32_t to_size; /* to VK buffer size */
>>>> + uint32_t wr; /* write offset shadow */
>>>> + uint32_t from_offset; /* bar offset to use */
>>>> + uint32_t from_size; /* from VK buffer size */
>>>> + uint32_t rd; /* read offset shadow */
>>> nit, these "unit32_t" stuff really doesn't matter in the kernel, 'u32'
>>> is a better choice overall. Same for u8 and others, for this whole
>>> driver.
>> Other than personal preference, I don't understand how 'u32' is better.
>> uint32_t follows the ANSI stdint.h.  It allows for portable code without
>> the need to define custom u32 types.
> The ANSI namespace does not work in the kernel, which is why we have our
> own types that pre-date those, and work properly everywhere in the
> kernel.
>
>> stdint types are used in many drivers in the linux kernel already.
>> We would prefer to keep our code as portable as possible and use
>> stdint types in the driver.
> You aren't porting this code to other operating systems easily, please
> use the kernel types :)
>
> And yes, these types are used in other parts, but when you have 25
> million lines of code, some crud does slip in at times...
OK, will reformat.
Seems like the stdint typedefs should not have been added to the linux/types.h
back in ancient kernel times.  If they are not to be used in the kernel, then they should be wrapped in a #ifndef __KERNEL__.
>
>>>> + pid_t pid;
>>>> + bool irq_enabled;
>>>> + bool is_opened; /* tracks tty open/close */
>>> Why do you need to track this? Doesn't the tty core handle this for
>>> you?
>> I have tried using tty_port_kopened() and it doesn't seem to work.
>> Will need to debug some more unless you have another suggested function to use.
> You didn't answer _why_ you need to track this. A tty driver shouldn't
> care about this type of thing.
We want to leave the data in the shared buffers coming from the card until someone is ready to read them.
So we track whether the particular tty device is open.  If the port is not open, we don't increment
our read pointer and leave the data in the buffer.  If it overflows, so be it, we'll get whatever data is in it when we open the tty device.
.
>
>>>> + struct workqueue_struct *tty_wq_thread;
>>>> + struct work_struct tty_wq_work;
>>>> +
>>>> + /* Reference-counting to handle file operations */
>>>> + struct kref kref;
>>> And a kref?
>>>
>>> What is controlling the lifetime rules of your structure?
>>>
>>> Why a kref?
>>>
>>> Why the tty ports?
>>>
>>> Why the misc device?
>>>
>>> This feels really crazy to me...
>> Comments mostly from Desmond here:
>>
>> Yes, we have created a PCIe centric driver that combines with both a misc devices on top (for the read/write/ioctrl), and also ttys.
>> The device sits on PCIe but we are using the misc device for accessing it.
>> tty is just another on top.  I don't think this is that uncommon to have a hybrid driver.
> Ugh, yes, it is uncommon because those are two different things. Why do
> you need/want a misc driver to control a tty device? Why do you need a
> tty device? What really is this beast?
The beast consists of a PCI card.  The PCI card communicates to the host via shared memory in BAR space and MSIX interrupts.
The host communicates through shared memory in BAR space and generates mailbox interrupts.
In addition the PCI card has DMA access to host memory to access data for processing.
The misc driver handles these operations.  Multiple user space processes are accessing the misc device at the same time
to perform simultaenous offload operations.  Each process opens the device and sends multiple commands with write operations
and receives solicited and unsolicited responses read read operations.  In addition there are sysfs entries to collect statistics and errors.
Ioctl operations are present for loading new firmware images to the card and reset operations.

In addition, the card has multiple physical UART connections to access consoles on multi processors on the card.
But, in a real system there is no serial cable connected from the card to the server.  And, there could be 16 PCIe cards
plugged into a server so that would make it even more unrealistic to access these consoles via a UART.

Fortunately the data sent out to each physical UART exists in a circular buffer.  These circular buffer can be access via the host in BAR space.
So we added tty device nodes per UART (2 per PCIe).  These are not the misc device node, these are seperate tty device nodes that are opened
and behave like tty devices.

We are not using the misc driver to control tty.
  1) The PCI driver is the foundaton of it which provides the physical interface to the card, 2) misc driver is a presentation to user space to do its regular message - this allows user to treat it as a char-device, so its like fopen/read/write/close etc. and 3) tty is an additional debug channel which could be on or off.

Please suggest how we can access the misc device and tty device nodes other than how we have implemented it in a single driver?
>
> We got rid of the old "control path" device nodes for tty devices a long
> time ago, this feels like a return to that old model, is that why you
> are doing this?
I don't know what old "control path" you are referring to and what the "new" path is?
>
> But again, I really don't understand what this driver is trying to
> control/manage, so it's hard to review it without that knowledge.
We have circular buffers in PCI memory that contain serial data.  We need to be able to open/close tty devices in linux for console operations.
And also to be able to perform operations such as lrz/sz.  This needs to have the same ability as if a physical serial cable was connected between the server and the card.  So user is able to either plug a UART cable in or open the "virtual" UART accessed over PCIe.

For misc device, the PCI memory is the physical interface, and on top it is a queue-messaging scheme for information exchange.  This is more for the control-path operations, cmd in/out etc.
>
>> Since we have a hybrid of PCIe + misc + tty, it means that we could simultaneously have opening dev/node to read/write (multiple) + tty o going.
> That's almost always a bad idea.
The mulitple users is a must for us.  We have multiple individual user space processes opening the misc device and communicating to it.
We also have 2 tty device nodes per card that can be opened/closed at any time.
And this is all on a PCIe card with shared memory, registers, and MSIX interrupts.
The card can be reset at any time, crash, or have PCIe rescan, etc.  User space processes need to be signalled when events detected so they don't hang.
What do you suggest otherwise?
>
>> Since the struct is embedded inside the primary PCIe structure, we need a way to know when all the references are done, and then at that point we could free the primary structure.
>> That is the reason for the kref.  On PCIe device removal, we signal the user space process to stop first, but the data structure can not be freed until the ref goes to 0.
> Again, you can not have multiple reference count objects controling a
> single object. That way is madness and buggy and will never work
> properly.
We're using this driver in systems that require a high degree of stability and reliability.
We have done extensive testing and don't see the bugs you are referring to.
It's working properly.

> You can have different objects with different lifespans, which, if you
> really really want to do this, is the correct way. Otherwise, stick
> with one object and one reference count please.
If we draw a hierachy, bcm_vk object encaps the misc_object + pci_object + tty_object.  The kref is for the bcm_vk object or the upper layer one.  It is to guarantee that all the sub-level objects are freed that we free this upper-level one.  I guess this is a result of the hybrid design, and this is mainly to avoid issue when we say "echo 1 > pci->remove".  The global structure under PCI will be removed, but we could not do so unless all its  misc_object/pci_object/tty_object are gone.  We did observe corner case if we don't have this that some of the apps who have opened the misc_device will seg-fault as it access the datastructure after the pcie->remove is done.  Not very common but more a corner case depending on timing.
> thanks,
>
> greg k-h
Regards,
 Scott

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature