Re: [PATCH V2] scripts/spdxcheck.py: Strictly read license files in utf-8

From: Jonathan Corbet
Date: Mon Jul 12 2021 - 11:58:59 EST


Nishanth Menon <nm@xxxxxx> writes:

> Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license")
> unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text
> While python will barf at it with:
>
> FAIL: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
> Traceback (most recent call last):
> File "scripts/spdxcheck.py", line 244, in <module>
> spdx = read_spdxdata(repo)
> File "scripts/spdxcheck.py", line 47, in read_spdxdata
> for l in open(el.path).readlines():
> File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
>
> While it is indeed debatable if 'Licensor.' used in the license file
> needs unicode quotes, instead, force spdxcheck to read utf-8.
>
> Reported-by: Rahul T R <r-ravikumar@xxxxxx>
> Signed-off-by: Nishanth Menon <nm@xxxxxx>
> Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

I've applied this, thanks.

jon