Re: [RFC PATCH] coccinelle: misc: add flexible_array.cocci script

From: Gustavo A. R. Silva
Date: Fri Aug 07 2020 - 12:12:38 EST


Hi Denis,

Thanks a lot for working on this. Please, see some comments below...

On 8/6/20 17:03, Denis Efremov wrote:
> Commit 68e4cd17e218 ("docs: deprecated.rst: Add zero-length and one-element
> arrays") marks one-element and zero-length arrays as deprecated. Kernel
> code should always use "flexible array members" instead.
>
> The script warns about one-element and zero-length arrays in structs.
>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Gustavo A. R. Silva <gustavoars@xxxxxxxxxx>
> Signed-off-by: Denis Efremov <efremov@xxxxxxxxx>
> ---
>
> Currently, it's just a draft. I've placed a number of questions in the
> script and marked them as TODO. Kees, Gustavo, if you could help me with
> my questions I think that this rule will be enough to close:
> https://github.com/KSPP/linux/issues/76
>
> BTW, I it's possible to not warn about files in uapi folder if
> this is relevant. Do I need to do it in the script?
>

I think the script should warn about new additions of zero-length/one-element
arrays in UAPI.

> scripts/coccinelle/misc/flexible_array.cocci | 158 +++++++++++++++++++
> 1 file changed, 158 insertions(+)
> create mode 100644 scripts/coccinelle/misc/flexible_array.cocci
>
> diff --git a/scripts/coccinelle/misc/flexible_array.cocci b/scripts/coccinelle/misc/flexible_array.cocci
> new file mode 100644
> index 000000000000..1e7165c79e60
> --- /dev/null
> +++ b/scripts/coccinelle/misc/flexible_array.cocci
> @@ -0,0 +1,158 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +///
> +/// Zero-length and one-element arrays are deprecated, see
> +/// Documentation/process/deprecated.rst
> +/// Flexible-array members should be used instead.
> +///
> +//
> +// Confidence: High
> +// Copyright: (C) 2020 Denis Efremov ISPRAS.
> +// Comments:
> +// Options: --no-includes --include-headers
> +
> +virtual context
> +virtual report
> +virtual org
> +virtual patch
> +
> +@r depends on !patch@
> +identifier name, size, array;
> +// TODO: We can additionally restrict size and array to:
> +// identifier size =~ ".*(num|len|count|size|ncpus).*";
> +// identifier array !~ ".*(pad|reserved).*";
> +// Do we need it?
> +type TS, TA;
> +position p;
> +@@
> +
> +(
> + // This will also match: typedef struct name { ...
> + // However nested structs are not matched, i.e.:
> + // struct name1 { struct name2 { int s; int a[0]; } st; int i; }
> + // will not be matched. Do we need to handle it?

It's fine. I think this would be a different script. One that
exclusively look for all three: zero-length, one-element arrays
and flexible array members in nested structures because
"A structure containing a flexible array member, or a union
containing such a structure (possibly recursively), may not be
a member of a structure or an element of an array. (However
these uses are permitted by GCC as extensions.)"[1]

> + struct name {
> + ... // TODO: Maybe simple ... is enough? It will match structs with a

Yep; simple is always better at first. :)

> + TS size; // single field, e.g.
> + ... // https://elixir.bootlin.com/linux/v5.8/source/arch/arm/include/uapi/asm/setup.h#L127
> +(
> +* TA array@p[0];
> +|
> + // TODO: It seems that there are exception cases for array[1], e.g.
> + // https://elixir.bootlin.com/linux/v5.8/source/arch/powerpc/boot/rs6000.h#L152
> + // https://elixir.bootlin.com/linux/v5.8/source/include/uapi/linux/cdrom.h#L292
> + // https://elixir.bootlin.com/linux/v5.8/source/drivers/net/wireless/ath/ath6kl/usb.c#L108
> + // We could either drop array[1] checking from this rule or
> + // restrict array name with regexp and add, for example, an "allowlist"
> + // with struct names where we allow this code pattern.
> + // TODO: How to handle: u8 data[1][MAXLEN_PSTR6]; ?
> +* TA array@p[1];
> +)
> + };
> +|
> + struct {
> + ...
> + TS size;
> + ...
> +(
> +* TA array@p[0];
> +|
> +* TA array@p[1];
> +)
> + };
> +|
> + // TODO: do we need to handle unions?

Yep; we should warn about this in unions, too.

However, I think unions cannot have members with
incomplete type, so we should not suggest the use
of flexible-array members in unions, because
flexible arrays have incomplete type.

> + union name {
> + ...
> + TS size;
> + ...
> +(
> +* TA array@p[0];
> +|
> +* TA array@p[1];
> +)
> + };
> +|
> + union {
> + ...
> + TS size;
> + ...
> +(
> +* TA array@p[0];
> +|
> +* TA array@p[1];
> +)
> + };
> +)
> +
> +// FIXME: Patch mode doesn't work as expected.
> +// Coccinelle handles formatting incorrectly.
> +// Patch mode in this rule should be disabled until
> +// proper formatting will be supported.
> +@depends on patch exists@
> +identifier name, size, array;
> +type TS, TA;
> +@@
> +
> +(
> + struct name {
> + ...
> + TS size;
> + ...
> +(
> +- TA array[0];
> +|
> +- TA array[1];
> +)
> ++ TA array[];
> + };
> +|
> + struct {
> + ...
> + TS size;
> + ...
> +(
> +- TA array[0];
> +|
> +- TA array[1];
> +)
> ++ TA array[];
> + };
> +|
> + union name {
> + ...
> + TS size;
> + ...
> +(
> +- TA array[0];
> +|
> +- TA array[1];
> +)
> ++ TA array[];
> + };
> +|
> + union {
> + ...
> + TS size;
> + ...
> +(
> +- TA array[0];
> +|
> +- TA array[1];
> +)
> ++ TA array[];

This is not allowed, neither is GCC[2] nor in Clang[3].

> + };
> +)
> +
> +@script: python depends on report@
> +p << r.p;
> +@@
> +
> +msg = "WARNING: use flexible-array member instead"
> +coccilib.report.print_report(p[0], msg)
> +
> +@script: python depends on org@
> +p << r.p;
> +@@
> +
> +msg = "WARNING: use flexible-array member instead"
> +coccilib.org.print_todo(p, msg)
>

I wonder if it might be worth it to also point people to
the documentation in deprecated.rst (commit 68e4cd17e218
("docs: deprecated.rst: Add zero-length and one-element arrays")),
once helpdesk generates the official documentation for 5.9-rc1.

Thanks
--
Gustavo

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://godbolt.org/z/Kajd7e
[3] https://godbolt.org/z/dvKMYb