git grep/sed to standardize "/* SPDX-License-Identifier: <license>"

From: Joe Perches
Date: Tue Oct 06 2020 - 19:15:31 EST


Almost all source files in the kernel use a standardized SPDX header
at line 1 with a comment /* initiator and terminator */:

/* SPDX-License-Identifier: <license> */

$ git grep -PHn '^/\* SPDX-License-Identifier:.*\*/\s*$' | \
wc -l
17847

$ git grep -PHn '^/\* SPDX-License-Identifier:.*\*/\s*$' | \
grep ":1:" | cut -f1 -d":" | grep -oP '\.\w+$' | \
sort | uniq -c | sort -rn
16769 .h
972 .S
87 .c
6 .lds
3 .l
2 .y
2 .py
2 .dtsi
1 .sh
1 .dts
1 .cpp
1 .bc

But about 2% of the files do not use a use comment termination at
line 1 and use either:

/* SPDX-License-Identifier: <license>
* additional comment or blank

or

/* SPDX-License-Identifier: <license>
<blank line>

$ git grep -PHn '^/\* SPDX-License-Identifier:(?!.*\*/\s*$)' | \
wc -l
407

$ git grep -PHn '^/\* SPDX-License-Identifier:(?!.*\*/\s*$)' | \
grep '\:1:' | cut -f1 -d':' | grep -oP '\.\w+$' | \
sort | uniq -c | sort -rn
357 .h
34 .S
16 .c

Here's a trivial script to convert and standardize the
first and second lines of these 407 files to make it easier
to categorize and sort.

$ git grep -PHn '^/\* SPDX-License-Identifier:(?!.*\*/\s*$)' | \
grep ':1:' | cut -f1 -d":" | \
xargs sed -i -e '1s@[[:space:]]*$@ */@' -r -e '2s@^( \*|)@/*@'