Re: [PATCH v2 3/3] libkmod-module: add support for a patient module removal option

From: Lucas De Marchi
Date: Wed Sep 29 2021 - 14:48:40 EST


On Mon, Sep 27, 2021 at 06:38:40PM -0700, Luis Chamberlain wrote:
On Thu, Sep 23, 2021 at 01:51:56AM -0700, Lucas De Marchi wrote:
On Mon, Aug 09, 2021 at 10:16:02PM -0700, Luis Chamberlain wrote:
The story was not kind like that. It wasn't removed "in favor for a 10
second sleep" in the sense that the sleep would replace the wait.

It was actually for "this wait logic in the kernel is complex and
buggy, let's try to remove it". So we decided to deprecate it and add
a sleep rmmod to see if anyone complained. 1 year later of no complains
we removed it from kernel. This was all after noticing we had never
implemented the wait logic in modprobe - it was only done in rmmod.

OK fixed the commit log thanks!

> --- a/libkmod/libkmod-module.c
> +++ b/libkmod/libkmod-module.c
> @@ -30,6 +30,9 @@
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> +#include <poll.h>
> +#include <time.h>
> +#include <math.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> @@ -802,6 +805,143 @@ KMOD_EXPORT int kmod_module_remove_module(struct kmod_module *mod,
> return err;
> }
>
> +static int timespec_to_ms(struct timespec *t)
> +{
> + return (t->tv_sec * 1000) + lround(t->tv_nsec / 1000000);
> +}
> +
> +static int time_delta_ms(struct timespec *before, struct timespec *after)
> +{
> + if (!before || !after)
> + return 0;
> + return timespec_to_ms(after) - timespec_to_ms(before);
> +}

we have a similar thing in util.[ch]

Alright, this OK?

diff --git a/shared/util.c b/shared/util.c
index b487b5f..b911e63 100644
--- a/shared/util.c
+++ b/shared/util.c
@@ -466,6 +466,19 @@ unsigned long long ts_usec(const struct timespec *ts)
(unsigned long long) ts->tv_nsec / NSEC_PER_USEC;
}

+unsigned long long ts_msec(const struct timespec *ts)
+{
+ return ts_usec(ts) * 1000;
+}
+
+unsigned long long ts_delta_ms(const struct timespec *before,
+ const struct timespec *after)
+{
+ if (!before || !after)
+ return 0;
+ return ts_msec(after) - ts_msec(before);
+}
+
unsigned long long stat_mstamp(const struct stat *st)
{
#ifdef HAVE_STRUCT_STAT_ST_MTIM
diff --git a/shared/util.h b/shared/util.h
index c6a31df..f8c28e7 100644
--- a/shared/util.h
+++ b/shared/util.h
@@ -43,6 +43,9 @@ int mkdir_p(const char *path, int len, mode_t mode);
int mkdir_parents(const char *path, mode_t mode);
unsigned long long stat_mstamp(const struct stat *st);
unsigned long long ts_usec(const struct timespec *ts);
+unsigned long long ts_msec(const struct timespec *ts);
+unsigned long long ts_delta_ms(const struct timespec *before,
+ const struct timespec *after);

/* endianess and alignments */
/* ************************************************************************ */

> +/**
> + * kmod_module_remove_module_wait:
> + * @mod: kmod module
> + * @flags: flags to pass to Linux kernel when removing the module. The only valid flag is
> + * KMOD_REMOVE_FORCE: force remove module regardless if it's still in
> + * use by a kernel subsystem or other process;
> + * KMOD_REMOVE_NOWAIT is always enforced, causing us to pass O_NONBLOCK to
> + * delete_module(2). We do the waiting in userspace, if a wait was desired.
> + *
> + * Remove a module from Linux kernel patiently.
> + *
> + * Returns: 0 on success or < 0 on failure.
> + */
> +KMOD_EXPORT int kmod_module_remove_module_wait(struct kmod_module *mod,
> + unsigned int flags,
> + bool wait)

why do you have kmod_get_refcnt_timeout/kmod_set_refcnt_timeout instead
of just doing s/bool wait/unsigned int wait_msec/)?

Because it lets us do a smaller change on the respetive tools:

tools/modprobe.c- flags |= KMOD_REMOVE_FORCE;
tools/modprobe.c-
tools/modprobe.c: err = kmod_module_remove_module_wait(mod, flags, do_remove_patient);
tools/modprobe.c- if (err == -EEXIST) {
tools/modprobe.c- if (!first_time)
--
tools/remove.c- goto unref;
tools/remove.c-
tools/remove.c: err = kmod_module_remove_module_wait(mod, 0, do_remove_patient);
tools/remove.c- if (err < 0)
tools/remove.c- goto unref;
--
tools/rmmod.c- }
tools/rmmod.c-
tools/rmmod.c: err = kmod_module_remove_module_wait(mod, flags,
tools/rmmod.c- do_remove_patient);
tools/rmmod.c- if (err < 0) {

That is, the timeout is contextual of the context.

> + if ((refcnt <= 0) || (refcnt > 0 && !wait)) {
> + NOTICE(mod->ctx, "%s refcnt is %d\n", mod->name, (int) refcnt);
> + err_time = clock_gettime(CLOCK_MONOTONIC, &t2);
> + if (err_time != 0)
> + kmod_set_removal_timeout(mod->ctx, 0);

I don't follow why kmod_module_get_refcnt_wait() is setting the removal
timeout at all. This seems to be doing it behind users back.

Because if clock_gettime() returns something other than 0 then
your clock is messed up and you should not be using a timeout, so
yes, we correct that then. We can scream loud, or use a default.

I figured not using one would be better in that case.

The idea of using the refcnt fd was actually that then
users could integrate it on their mainloops (probably using epoll). And
then the same impl could be shared by kmod_module_remove_module_wait(),
which would do a select().

This seems more like a kmod_module_refcnt_wait_zero() using poll()
+ adjusting the timeout

Sorry don't follow. And since I have one day before vacation, I suppose
I won't get to this until I get back. But I'd be happy if you massage
it as you see fit as you're used to the code base and I'm sure have
a better idea of what likely is best for the library.


sure, np. I will take a look as time permits.

thanks
Lucas De Marchi


> + ret = kmod_module_get_refcnt_wait(mod, do_remove_patient);

for tool implementation, shouldn't we just ignore
kmod_module_get_refcnt() and proceed to
kmod_module_remove_module_wait()?

I'll let you decide. Otherwise this will have to wait until I get back
from vacation.

Luis