Stats on automatically backporting Linux with Coccinelle

From: Luis R. Rodriguez
Date: Sat Sep 19 2015 - 03:50:24 EST


Thanks to all who contributed answering questions regarding use of Linux
backports [0] in the industry, a paper is now published (EDCC 2015) and
available regarding evaluation of use of use of Coccinelle to help
automatically backport the Linux kernel [1]. Below are some stats and
specific statistics definitions defined and used to reflect efficiency
and work saved of use of Coccinelle rules on the backports project.

In summary we currently end up with that ~56% of backporting work is now
automatically generated using Coccinelle. The rest of the stats below
reveal exact metrics of efficiency of each used Coccinelle rule on the
project. Eventually if folks like it we can strive to automate
generating these stats for each release, graph progress over time and
keep to evaluation of how much work each rule is saving us. A curious
observation we've made while working on these metrics was that the run
time impact of a simple rule with only an maintenance efficiency of ~1
will have the very similar run time impact in comparison to more complex
rules. On a laptop this meant an impact about 3-7s per new rule
coccinelle file, what this means in practice is that even if you have a
simple Coccinelle rule to backport just one collateral evolution that
would address backporting only one device driver adding it to the
project is desirable and encouraged given that the run time impact is
always linear and we expect the gains of its use to help with long term
maintenance (less patch refresh and manual patch hunk fixing) and doing
more backport work automatically in case collateral evolution ends up
being useful at any point in time for other drivers / new code that
would need it.

We define two metrics of efficiency, development efficiency and
maintenance efficiency. For development efficiency, we start with the
number of insertions and deletions that a semantic patch generates,
ignoring context information, as reported by git diff --stat, and take
the ratio of this number with the size of the semantic patch, exclusive
of comments and whitespace. The number of insertions and deletions
represents the number of manual changes required when modifying the
code. Development efficiency thus represents the initial coding savings
induced by using semantic patches. For maintenance efficiency, we
compute the same ratio, but this time consider the complete size of the
patch, not only the insertions and deletions, but also all the metadata
information contained within the patch generated by the semantic patch,
including file names, file offsets, and (unmodified) context lines; all
of this metadata must also be kept up to date so that the patch command
can apply the patch to the relevant files. A development (resp.,
maintenance) efficiency value of 1 means the semantic patch has the same
number of lines as the changes (resp., lines) in the patch series it
replaces. A development (resp., maintenance) efficiency value of 2 means
the semantic patch is producing twice as many changes (resp., patch
lines) as the number of lines in the semantic patch.

The tools used to generate these stats is up on github temporarily [2],
again if we like these sorts of stats we should probably consider
porting it to Python, keeping record of stats and integrate it into
either the backports or coccinelle project.

Maintenance efficiency:

Size of patch (insertions + deletions + context)
--------------------------------------------------
Size of SmPL patch

Development efficiency:

Size of relevant changes of patch(insertions + deletions)
----------------------------------------------------------
Size of SmPL patch

-----------------------------------------------------------------------------
Development and Maintenance efficiency metrics:
-----------------------------------------------------------------------------
dev-efficiency maint-efficiency diff-wc diffstat clean SmPL-Patch
0.666667 1.88889 17 6 9 skb_no_xmit_more.cocci
1.16667 2.79167 67 28 24 ptp_getsettime64.cocci
0.142857 1 14 2 14 features_check.cocci
1.18182 5.45455 60 13 11 0055-netdev-tstats.cocci
1.55906 3.55118 451 198 127 0054-struct-proto_ops-sig.cocci
0.666667 1.88889 17 6 9 no-pfmemalloc.cocci
0.634146 1.4878 61 26 41 set_vf_rate.cocci
3.75 10.625 85 30 8 igb_pci_error_handlers.cocci
1.07692 4.23077 55 14 13 ethtool_cmd_mdix.cocci
0.588235 1.23529 21 10 17 rxnfc.cocci
0.285714 1.53571 43 8 28 get_module.cocci
0.285714 1.53571 43 8 28 ethtool_eee.cocci
0.714286 2.28571 16 5 7 skb_no_fcs.cocci
0.25 1.59375 51 8 32 set_vf_spoofchk.cocci
0.428571 2.85714 40 6 14 sriov_configure.cocci
0.87037 2.7037 146 47 54 0031-sk_data_ready.cocci
4.4 17.9 179 44 10 genl-const.cocci
6.88889 48.1111 433 62 9 0019-usb_driver_lpm.cocci
0.571429 4.14286 58 8 14 get_ts_info.cocci
10.8667 42.4 636 163 15 0001-netlink-portid.cocci
2.93333 16.1167 967 176 60 0002-no_dmabuf.cocci
0.512821 1.89744 74 20 39 0002-group_attr_bus.cocci
0.769231 2.79487 109 30 39 0001-group_attr_class.cocci
1.47588 5.67363 3529 918 622 all-SmPL.cocci
-----------------------------------------------------------------------------
Patch total diff wc -l: 2790
SmPL total diff wc -l: 3529
Total total diff wc -l: 6319
---------------------------------------
Patch diff % contribution: 44.1526
SmPL diff % contribution: 55.8474

[0] http://backports.wiki.kernel.org
[1] http://coccinelle.lip6.fr/papers/backport_edcc15.pdf
[2] https://github.com/mcgrof/backports-cocci-stats

Luis

Attachment: signature.asc
Description: Digital signature