Re: [PATCH] kunit: tool: continue past invalid utf-8 output

From: Brendan Higgins
Date: Fri Oct 08 2021 - 17:15:40 EST


On Fri, Oct 8, 2021 at 2:08 PM Daniel Latypov <dlatypov@xxxxxxxxxx> wrote:
>
> kunit.py currently crashes and fails to parse kernel output if it's not
> fully valid utf-8.
>
> This can come from memory corruption or or just inadvertently printing
> out binary data as strings.
>
> E.g. adding this line into a kunit test
> pr_info("\x80")
> will cause this exception
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte
>
> We can tell Python how to handle errors, see
> https://docs.python.org/3/library/codecs.html#error-handlers
>
> Unfortunately, it doesn't seem like there's a way to specify this in
> just one location, so we need to repeat ourselves quite a bit.
>
> Specify `errors='backslashreplace'` so we instead:
> * print out the offending byte as '\x80'
> * try and continue parsing the output.
> * as long as the TAP lines themselves are valid, we're fine.
>
> Signed-off-by: Daniel Latypov <dlatypov@xxxxxxxxxx>

Thanks for fixing this!

Reviewed-by: Brendan Higgins <brendanhiggins@xxxxxxxxxx>