Re: [PATCH v2] checkpatch: look for common misspellings

From: Joe Perches
Date: Wed Sep 10 2014 - 03:00:59 EST


On Wed, 2014-09-10 at 13:37 +0900, Masanari Iida wrote:
> Hello Joe, Kees,

Hello Masanari-san.

> Sorry for late reply.
> I was on holiday when the version 1 patch discussions were posted.

No worries, holidays are far more important
than patches like this...

These patches are simple niceties, not fixes
for bugs, so review and acceptance timing is
not urgent.

> I am using codespell ( https://github.com/lucasdemarchi/codespell/ ).
> The codespell has its own typo dictionary.
> The dictionary format is
>
> typo->good (1 candidate)
> typo->good1,good2, (multiple candidates)
> typo->good, comment (1 candidate with special remark)
>
> Its similar to your typo||good format.
>
> The license of the codespell is GPLv2 according to COPYING file in tar ball.
>
> Compare number of typo samples in dictionary.
> Your dictionary : 1033
> codespell-1.4 : 4261
> codespell-1.4 + my adding 5245
> Your dictionary + codespell-1.4 + my adding - remove duplicate: 5742
>
> Latest version of codespell is 1.7.
> My dictionary is based on codespell-1.4. So I use the number as of 1.4.
>
> I can provide my typo samples under GPLv2 license.

Thanks.

Any additions you have to the dictionary would be
gladly welcomed.

Using a common format for the dictionary and any
suggested corrections would be good too.

Maybe the dictionary and code should be changed to
use the codespell format. It seems a bit more
flexible than the lintian form.

I do not know if one project is more active than
the other, but perhaps that should be the deciding
factor. Or maybe just Kees' preference...

Merging all these together might not be a good
solution though.

Right now, the checkpatch spelling code uses word
boundaries that include an underscore.

checkpatch spelling tests are done on 4 segments of
a #define like "PREFIX_PREFERED_SEG_ABC" finding the
misspelling of PREFERED.

Some sifting of the dictionary is still necessary to
eliminate some common prefixes to avoid too many false
positives.

For example, "ths" was dropped because it's a prefix
used by several modules even though it's a somewhat
frequent typo.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/