Re: [PATCH -v2 2/2] printk: Add kernel parameter to control writes to /dev/kmsg

From: Ingo Molnar
Date: Fri Jul 01 2016 - 05:10:42 EST



* Borislav Petkov <bp@xxxxxxxxx> wrote:

> +printk_kmsg:
> +
> +Control the logging to /dev/kmsg from userspace:
> +
> +0: default, ratelimited
> +1: unlimited logging to /dev/kmsg from userspace
> +2: logging to /dev/kmsg disabled
> +
> +The kernel command line parameter printk.kmsg= overrides this and is a
> +one-time setting for the duration of the system lifetime: once set, it
> +cannot be changed by this sysctl interface anymore.

Nit:

s/
is a one-time setting for the duration of the system lifetime: once set, it ...
/
is a one-time setting until the next reboot: once set, it ...

As I really hope this bit is not burned permanently into the system hardware! ;-)

> +++ b/include/linux/printk.h
> @@ -171,6 +171,12 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
> extern int printk_delay_msec;
> extern int dmesg_restrict;
> extern int kptr_restrict;
> +extern unsigned int devkmsg_log;
> +
> +struct ctl_table;
> +
> +int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos);
>
> extern void wake_up_klogd(void);

Nit: I'd make devkmsg_sysctl_set_loglvl() extern as well, to stay consistent with
the surrounding prototypes.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -86,6 +86,48 @@ static struct lockdep_map console_lock_dep_map = {
> };
> #endif
>
> +#define DEVKMSG_LOG_RATELIMIT 0
> +#define DEVKMSG_LOG_ON 1
> +#define DEVKMSG_LOG_OFF 2
> +#define DEVKMSG_LOCK (1 << 8)
> +#define DEVKMSG_LOG_MASK (DEVKMSG_LOCK - 1)
> +#define DEVKMSG_LOCKED_MASK ~DEVKMSG_LOG_MASK

Hm, so the state definitions and names here look a bit confusing to me, got a
headache trying to sort through them!

So from a UI point of view, what we want to have is 5 levels:

permanent-off
off
ratelimit
on
permanent-on

Right?

You implemented the 'permanent' bit via an independent LOCK bit in the flags
state. This leaves us:

off
ratelimit
on

... which you implemented via giving two independent bits to 'off' and 'on', and
if neither is set that means 'ratelimit', right?

So the most robust way to define such bitfields is via a pattern like this:

enum devkmsg_log_bits {
__DEVKMSG_LOG_BIT_ON,
__DEVKMSG_LOG_BIT_OFF,
__DEVKMSG_LOG_BIT_LOCK,
};

enum devkmsg_log_masks {
DEVKMSG_LOG_MASK_ON = BIT(__DEVKMSG_LOG_BIT_ON),
DEVKMSG_LOG_MASK_OFF = BIT(__DEVKMSG_LOG_BIT_OFF),
DEVKMSG_LOG_MASK_LOCK = BIT(__DEVKMSG_LOG_BIT_LOCK),
};

/* Keep both the 'on' and 'off' bits clear, i.e. ratelimit by default: */
#define DEVKMSG_LOG_MASK_DEFAULT 0

The double underscore prefixes are there to make sure we never use the bit numbers
directly.

And then all the code becomes pretty self-explanatory IMHO:

> +/* DEVKMSG_LOG_RATELIMIT by default */
> +unsigned int __read_mostly devkmsg_log;

unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;

Note that this initialization would survive any future change of the default.

> +static int __init control_devkmsg(char *str)
> +{
> + if (!str)
> + return -EINVAL;
> +
> + if (!strncmp(str, "on", 2))
> + devkmsg_log = DEVKMSG_LOG_ON;
> + else if (!strncmp(str, "off", 3))
> + devkmsg_log = DEVKMSG_LOG_OFF;
> + else if (!strncmp(str, "ratelimit", 9))
> + devkmsg_log = DEVKMSG_LOG_RATELIMIT;
> + else
> + return -EINVAL;
> +
> + /* Sysctl cannot change it anymore. */
> + devkmsg_log |= DEVKMSG_LOCK;
> +
> + return 0;
> +}
> +__setup("printk.kmsg=", control_devkmsg);

This would become something like:

if (!strncmp(str, "on", 2))
devkmsg_log = DEVKMSG_LOG_MASK_ON;
else if (!strncmp(str, "off", 3))
devkmsg_log = DEVKMSG_LOG_MASK_OFF;
else if (!strncmp(str, "ratelimit", 9))
devkmsg_log = 0;
else
return -EINVAL;

/* Sysctl cannot change it anymore: */
devkmsg_log |= DEVKMSG_LOG_MASK_LOCK;

> +
> +int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> + if (devkmsg_log & DEVKMSG_LOCKED_MASK) {
> + if (write)
> + return -EINVAL;
> + }
> +
> + return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +}

This could be written as:

if (devkmsg_log & DEVKMSG_LOG_MASK_LOCK) {


> + /* Ignore when user logging is disabled. */
> + if ((devkmsg_log & DEVKMSG_LOG_MASK) == DEVKMSG_LOG_OFF)
> + return len;

This could be written more clearly as:

if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
return len;

... note how we don't have to be careful about masking out the locked bit: we just
query the 'off' state bit.

> + /* Ratelimit when not explicitly enabled or when we're not booting. */
> + if ((system_state != SYSTEM_BOOTING) &&
> + ((devkmsg_log & DEVKMSG_LOG_MASK) != DEVKMSG_LOG_ON)) {
> + if (!___ratelimit(&user->rs, current->comm))
> + return ret;
> + }

... and this could be written as:

if ((system_state != SYSTEM_BOOTING) &&
(!(devkmsg_log & DEVKMSG_LOG_MASK_ON)) {

which too is a bit more robust: we already know that user logging is not disabled,
now we check whether it's always-on or ratelimited.

> + .procname = "printk_kmsg",
> + .data = &devkmsg_log,

So I liked the 'devkmsg_log' name, because it clearly tells us that this is about
/dev/kmsg logging state. So I think the visible UI name should be similar:

.procname = "printk_devkmsg",
.data = &devkmsg_log,

... this way people who know the 'devkmsg' string would know what to grep for in
the kernel source - or what flag to search for in /proc/sys/kernel/.

Thanks,

Ingo