Re: [PATCH 0/9] powerpc: delete duplicated words

From: Joe Perches
Date: Sun Jul 26 2020 - 17:00:02 EST


On 2020-07-26 12:08, Randy Dunlap wrote:
On 7/26/20 10:49 AM, Joe Perches wrote:
On Sun, 2020-07-26 at 10:23 -0700, Randy Dunlap wrote:
On 7/26/20 7:29 AM, Christophe Leroy wrote:
Randy Dunlap <rdunlap@xxxxxxxxxxxxx> a Ãcrit :

Drop duplicated words in arch/powerpc/ header files.

How did you detect them ? Do you have some script for tgat, or you just read all comments ?

Yes, it's a script that finds lots of false positives, so I have to check
each and every one of them for validity.

And it's a lot of work too. (thanks Randy)

It could be something like:

$ grep-2.5.4 -nrP --include=*.[ch] '\b([A-Z]?[a-z]{2,}\b)[ \t]*(?:\n[ \t]*\*[ \t]*|)\1\b' * | \
grep -vP '\b(?:struct|enum|union)\s+([A-Z]?[a-z]{2,})\s+\*?\s*\1\b' | \
grep -vP '\blong\s+long\b' | \
grep -vP '\b([A-Z]?[a-z]{2,})(?:\t+| {2,})\1\b'

Hi Joe,

Hi Randy

(what is grep-2.5.4 ?)

It's the last version of grep that allowed spanning multiple lines.

That's to find the comment second lines that start with *

It looks like you tried a few iterations of this -- since it drops things
like "long long". There are lots of data types that are repeated & valid.
And many struct names, like "struct kref kref", "struct completion completion",
and "struct mutex mutex". I handle (ignore) those manually

that's the first exclude pattern.