Re: [PATCH v2] uio:uio_pci_generic:Don't clear master bit when the process does not exit

From: Weifeng Su
Date: Tue Feb 21 2023 - 07:48:46 EST


On 2023/2/21 2:11, Greg KH wrote:
On Tue, Feb 21, 2023 at 01:10:44AM +0800, Su Weifeng wrote:
From: Weifeng Su <suweifeng1@xxxxxxxxxx>

The /dev/uioX device has concurrent operations in a few scenarios.

For example, when a process using the device exits abnormally,
the management program starts the same process to operate the device.
When the process exits and closes the /dev/uioX device,
the master bit of the device is cleared. In this case, if the
new process is issuing commands, I/Os are suspended and cannot be
automatically recovered.

Therefore, reference counting is added to clear the master bit
only when the last process exits.

Signed-off-by: Weifeng Su <suweifeng1@xxxxxxxxxx>
---
The difference between the first patch and the first patch is that
the reference counting operation is performed using the atomic semantics,
just like other drivers under UIO:
cdfa835c6e5e87d145f("uio_hv_generic: defer opening vmbus until first use").

And I would claim that that change too is incorrect.

Did you test this with dup()? Lots of open/close cycles on the same
device node? Passing around the file descriptor?
The patch is verified to be able to fix the low-probability problem in our scenario. At the same time, we also perform the dup test on the latest kernel(6.2.0-rc8-g0f5e65cd8f9b-dirty) based on your suggestions. The test code model is as follows:
...
int main()
{
int fd = open("/dev/uio0", O_RDWR);
if (fd < 0) {
printf("error in open /dev/uio0\n");
return -1;
}
int dup_fd = dup(fd);
while (true) {
sleep(1);
}
}

After kill the process, The test results are as follows:
[ 335.730916] swf call uio open
[ 338.307306] swf call uio release

dup does not cause uio_pci_generic.c:open or uio_pci_generic.c:release reference counting exceptions.
PS: In this example, a print is added to the entries of uio_open and uio_release of the driver to confirm the function invoking status.

After reading the "dup" code, we confirm that the struct file uses the f_count protect the same file in the process. The file opened by the "dup" only adds reference counting and does not invoke the open operation of the driver. Disabling the fd only decrementes the f_count count. The release function of the driver is invoked to clear resources only when f_count is 0. Different processes use different struct files to open the same file, and f_count cannot enable the constraint function. In this case, the driver needs to handle the concurrency problem of multiple processes,It's like this patch
cdfa835c6e5e87d145f("uio_hv_generic: defer opening vmbus until first use")

Logically, this is identical to your previous change, so why should it
be accepted?

Again, why not just use a real PCI driver for your device?
We use the uio_pci_generic driver because we use the DPDK, which is a user-mode development platform on which you can develop the user-mode driver.

thanks,

greg k-h

Best regards,
Weifeng Su