Re: [Cocci] coccinelle: Convert comma to semicolons (was Re: [PATCH] checkpatch: Add test for comma use that should be semicolon)

From: Joe Perches
Date: Thu Aug 20 2020 - 12:52:48 EST


On Thu, 2020-08-20 at 10:33 +0200, Julia Lawall wrote:
> On Wed, 19 Aug 2020, Joe Perches wrote:
> > On Wed, 2020-08-19 at 14:22 -0700, Joe Perches wrote:
> > > There are commas used as statement terminations that should typically
> > > have used semicolons instead. Only direct assignments or use of a single
> > > function or value on a single line are detected by this test.
> > >
> > > e.g.:
> > > foo = bar(), /* typical use is semicolon not comma */
> > > bar = baz();
> > >
> > > Add an imperfect test to detect these comma uses.
> > >
> > > No false positives were found in testing, but many types of false negatives
> > > are possible.
> > >
> > > e.g.:
> > > foo = bar() + 1, /* comma use, but not direct assignment */
> > > bar = baz();
> >
> > Hi.
> >
> > I recently added a test for this condition to linux's checkpatch.
> >
> > A similar coccinelle script might be:
> >
> > $ cat comma.cocci
> > @@
> > expression e1;
> > expression e2;
> > @@
> >
> > e1
> > - ,
> > + ;
> > e2;
> > $
> >
> > This works reasonably well but it has several false positives
> > for declarations like:
> >
> > $ spatch --sp-file comma.cocci mm/huge_memory.c
> > diff -u -p a/huge_memory.c b/huge_memory.c
> > --- a/huge_memory.c
> > +++ b/huge_memory.c
> > @@ -2778,7 +2778,7 @@ static unsigned long deferred_split_scan
> > struct pglist_data *pgdata = NODE_DATA(sc->nid);
> > struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
> > unsigned long flags;
> > - LIST_HEAD(list), *pos, *next;
> > + LIST_HEAD(list), *pos; *next;
> > struct page *page;
> > int split = 0;
> > $
> >
> > Any script improvement suggestions?
>
> I have a bunch of variations of this that are more complicated than I
> would have expected. One shorter variant that I have is:
>
> @@
> expression e1,e2;
> statement S;
> @@
>
> S
> e1
> -,
> +;
> (<+... e2 ...+>);
>
> This will miss cases where the first statement is the comma thing. But I
> think it is possible to improve this now. I will check.

Hi Julia.

Right, thanks, this adds a dependency on a statement
before the expression. Any stragglers would be easily
found using slightly different form.
There are not very many of these in linux kernel.

Another nicety would be to allow the s/,/;/ conversion to
find both b and c in this sequence:
a = 1;
b = 2,
c = 3,
d = 4;
without running the script multiple times.
There are many dozen uses of this style in linux kernel.

I tried variants of adding a comma after the e2 expression,
but cocci seems to have parsing problems with:

@@
expression e1;
expression e2;
@@
e1
- ,
+ ;
e2,

I do appreciate that coccinelle adds braces for multiple
expression comma use after an if.

i.e.:
if (foo)
a = 1, b = 2;
becomes
if (foo) {
a = 1; b = 2;
}

There are a few dozen uses of this style in linux kernel.