Re: [PATCH 2/3] gen_compile_commands: wire up build rule to Makefile

From: Masahiro Yamada
Date: Wed Aug 19 2020 - 00:30:27 EST


Hi Nick,

On Fri, Aug 14, 2020 at 2:10 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
>
> On Thu, Aug 13, 2020 at 7:30 AM 'Nick Desaulniers' via Clang Built
> Linux <clang-built-linux@xxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Aug 12, 2020 at 10:40 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> > >
> > > Currently, you need to explicitly run scripts/gen_compile_commands.py
> > > to create compile_commands.json. It traverses the object tree
> > > (you need to pass the -d option to deal with a separate output tree),
> > > and parses all the .*.cmd file found.
> > >
> > > If you rebuild the kernel over again without 'make clean', stale
> > > .*.cmd files from older builds will create invalid entries in
> > > compile_commands.json.
> >
> > Definitely a problem; happy to see compile_commands.json added to
> > `make clean` target, too.
> >
> > >
> > > This commit wires up the compile_commands.json rule to the top
> > > Makefile, and makes it parse .*.cmd files only from the current build
> > > to avoid stale entries.
> > >
> > > It is possible to extract only relevant .*.cmd files by checking
> > > $(KBUILD_VMLINUX_OBJS), $(KBUILD_VMLINUX_LIBS), and modules.order.
> > > The objects or archives linked to vmlinux are listed in
> > > $(KBUILD_VMLINUX_OBJS) or $(KBUILD_VMLINUX_LIBS). All the modules are
> > > listed in modules.order.
> > >
> > > You can create compile_commands.json from Make:
> > >
> > > $ make -j$(nproc) CC=clang compile_commands.json
> > >
> > > Of course, you can build vmlinux, modules, and compile_commands.json
> > > all together in a single command:
> > >
> > > $ make -j$(nproc) CC=clang all compile_commands.json
> > >
> > > It works also for M= builds. In this case, compile_commands.json
> > > is created in the top directory of the external module.
> > >
> > > I hope this will be overall improvements, but it has a drawback;
> > > the coverage of the compile_commands.json is reduced because only
> > > the objects linked to vmlinux or modules are handled. For example,
> > > the following C files are not included in compile_commands.json:
> > >
> > > - Decompressor source files (arch/*/boot/compressed/)
> > > - VDSO source files
> > > - C files used to generate intermediates (e.g. kernel/bounds.c)
> > > - standalone host programs
> >
> > Oof, for an x86_64 defconfig, the difference in line count of
> > compile_commands.json
> > before: 12826
>
>
> I think some lines of 'before'
> are not so important.
>
> Files suffixed with *.mod.c
> are generated sources for modules.
> There is no point to check them by Clang tools.
>
>
> Some entries appear twice:
>
> For example, 'before' contains two entries of
> "file": "lib/cmdline.c"
> Which entry is used by 'clang-tidy lib/cmdline.c',
> the first one, the second one, or both?
>
>
>
> Having said that, there is still a loss of more than 3%, yes.
>
>
> > after: 12351
> >
> > That's a loss of 475 (3.7% of 12826) coverage. Is there something more
> > we can do to preserve this functionality, while avoiding stale .cmd
> > files?
>
>
> I have no idea how to do this correctly.
>
> > Is it that those aren't specified by `$(KBUILD_VMLINUX_OBJS)
> > $(KBUILD_VMLINUX_LIBS)` ?
>
> These variables contain only objects and archives
> linked to vmlinux.
>
>
>
>
> For example, VDSO is built as a prerequisite of
> another object that wraps it.
>
> See line 61 of arch/arm64/kernel/vdso/Makefile:
> $(obj)/vdso.o : $(obj)/vdso.so
>
>
> I do not know how to get the full list of active objects,
> some of which are built on demand
> in the dependency chain.
>
>
> Idea 1)
> Merge this series, and accept the loss.
>
>
> Idea 2)
> Add Makefile targets,
> and also keep the previous work-flow.
>
> When you run it from Make,
> only objects for vmlinux and modules are handled.
>
> When you need the full coverage, including non-kernel-space
> sources, run scripts manually:
>
> $ scripts/clang-tools/gen_compile_commands.py
> $ scripts/clang-tools/run-clang-tools.py clang-tidy



Do you have any idea to cope with
the 3% loss problem?

If it is a problem, maybe I should try Idea 2).


Thanks.








>
> Idea 3)
> Give up supporting it from Makefile.
> Instead, improve gen_scripts_commands.py
> as a standalone program.
>
>
> Maybe we can check whether the compiler is Clang or not.
> We can run '<compiler> --version' and drop the
> entry if it is GCC.
>
> Usually, the compiler is the first word of
> the "command" field in compile_commands.json,
> but there are exceptions because
> people may do CC="ccache clang".
>
>
> If there are still stale entries causing troubles,
> you need to run 'make clean', and rebuild the tree.
>
>
> We were trying to have separate scripts,
> gen_compile_commands.py and run-clang-tools.py,
> and to add Makefile targets to run them in a row.
>
> I think unifying the two scripts
> might be handier.
>
>
> Add two options, -t, -a,
> to scripts/gen_compile_commands.py
>
> If they are given,
> scripts/gen_compilile_commands.py
> generates compile_commands.json,
> and immediately runs clang-tidy against it.
>
>
> -t, --tidy
> Run 'clang-tidy -checks=-*,linuxkernel-*' after generating
> compilation database
> -a, --analyzer
> Run 'clang-tidy -checks=-*,clang-analyzer-*' after generating
> compilation database
>
>
> Both -a and -t are given,
> it runs
> 'clang-tidy -checks=-*,linuxkernel-*,clang-analyzer-*'
>
> This works more efficiently
> if you want to check everything.
>
>
> 'make clang-tidy clang-analyzer'
> will invoke clang-tidy twice for each file,
> which is not very efficient.
>
>
>
>
> > > clean-dirs := $(KBUILD_EXTMOD)
> > > -clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps
> > > +clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \
> > > + $(KBUILD_EXTMOD)/compile_commands.json
> >
> > So the `clean` target doesn't make use of `CLEAN_FILES`? It looks like
> > there's some duplication there? Oh, this is dependent on
> > !KBUILD_EXTMOD, and is a new `clean` target. Do I understand that
> > correctly?
>
> Correct.
>
> We can move CLEAN_FILES to a common part
> so external module builds can use it.
>
> > > """
> > > usage = 'Creates a compile_commands.json database from kernel .cmd files'
> > > parser = argparse.ArgumentParser(description=usage)
> > >
> > > - directory_help = ('Path to the kernel source directory to search '
> > > - '(defaults to the working directory)')
> > > - parser.add_argument('-d', '--directory', type=str, help=directory_help)
> > > + ar_help = 'command used for parsing .a archives'
> > > + parser.add_argument('-a', '--ar', type=str, default='ar', help=ar_help)
> >
> > Might be nice to warn if run with no arguments? In case someone does:
> > $ ./scripts/clang-tools/gen_compile_commands.py
>
> Right.
>
> nargs='+' seems to work.
>
>
>
> > > + # Collect objects compiled for vmlinux or modules
> > > + objects = []
> > > + for file in files:
> > > + if file.endswith('.o'):
> > > + # Some objects (head-y) are linked to vmlinux directly
> > > + objects.append(file)
> > > + elif file.endswith('.a'):
> > > + # Most of built-in objects are linked via built-in.a or lib.a.
> > > + # Use 'ar -t' to get the list of the contained objects.
> > > + objects += subprocess.check_output([ar, '-t', file]).decode().split()
> > > + elif file.endswith('modules.order'):
> > > + # modules.order lists all the modules.
> > > + with open(file) as f:
> >
> > `file` is another builtin (or at least was in Python2), perhaps `filename`?
> >
> > > + for line in f:
> > > + ko = line.rstrip()
> > > + base, ext = os.path.splitext(ko)
> > > + if ext != '.ko':
> > > + sys.exit('{}: mobule path must end with .ko'.format(ko))
> > > + mod = base + '.mod'
> > > + # The first line of *.mod lists the objects that
> > > + # compose the module.
> >
> > This comment and the one above it uses tabs for indentation vs spaces
> > for the rest of the file. I use
> > https://github.com/nickdesaulniers/dotfiles/blob/a90865a9ea48bbefa0082f7508607fdeb361e801/.vimrc#L37-L43
> > to help me catch these.
>
> Awesome. Copied to mine.
>
>
>
> > > + with open(mod) as mod_f:
> > > + objects += mod_f.readline().split()
> > > + else:
> > > + sys.exit('{}: unknown file type'.format(file))
> >
> > Consider breaking up this one long function into multiple, perhaps the
> > above could just return `objects`?
>
>
>
> I thought that returning a big list causes needless memory-copy.
> If we do not need to be worried too much,
> I can make it a helper function.
>
>
> >
> > >
> > > compile_commands = []
> > > - for dirpath, _, filenames in os.walk(directory):
> > > - for filename in filenames:
> > > - if not filename_matcher.match(filename):
> > > - continue
> > > - filepath = os.path.join(dirpath, filename)
> > > -
> > > - with open(filepath, 'rt') as f:
> > > - line = f.readline()
> > > - result = line_matcher.match(line)
> > > - if result:
> > > - try:
> > > - entry = process_line(directory, dirpath,
> > > - result.group(1), result.group(2))
> > > - compile_commands.append(entry)
> > > - except ValueError as err:
> > > - logging.info('Could not add line from %s: %s',
> > > - filepath, err)
> > > + cwd = os.getcwd()
> > > + for object in objects:
> > > + dir, notdir = os.path.split(object)
> >
> > `object` is a builtin Class in python. I'm not sure if it's quite
> > considered a keyword, but maybe a different identifier would be nicer,
> > like `object_file` or something?
>
>
> Not a keyword, but 'object' is a class, yes.
> Not sure about 'file'.
>
>
> $ python
> Python 3.8.2 (default, Jul 16 2020, 14:00:26)
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import keyword
> >>> keyword.iskeyword("import")
> True
> >>> keyword.iskeyword("if")
> True
> >>> keyword.iskeyword("file")
> False
> >>> keyword.iskeyword("object")
> False
> >>> object
> <class 'object'>
> >>> file
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'file' is not defined
> >>>
>
>
> If this is a problem, I can rename it.
>
>
>
>
> > > + cmd_file = os.path.join(dir, '.' + notdir + '.cmd')
> > > + with open(cmd_file, 'rt') as f:
> > > + line = f.readline()
> > > + result = line_matcher.match(line)
> >
> > ^ combine statements.
>
> OK.
>
>
> > > + if result:
> > > + entry = process_line(cwd, result.group(1), result.group(2))
> > > + compile_commands.append(entry)
> > >
> > > with open(output, 'wt') as f:
> > > json.dump(compile_commands, f, indent=2, sort_keys=True)
> > >
> > > - count = len(compile_commands)
> > > - if count < _LOW_COUNT_THRESHOLD:
> > > - logging.warning(
> > > - 'Found %s entries. Have you compiled the kernel?', count)
> > > -
> > > -
> > > if __name__ == '__main__':
> > > main()
> > > --
> > > 2.25.1
> > >
> >
> > Thank you for your assistance and help enabling these tools.
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@xxxxxxxxxxxxxxxx.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdkL%3D667%2Bcw_Rxq_5zaOKeTTptsMaxkkSXBic9QxozOWVg%40mail.gmail.com.
>
>
>
> --
> Best Regards
> Masahiro Yamada



--
Best Regards
Masahiro Yamada