Re: [mm 4.15-rc8] Random oopses under memory pressure.

From: Al Viro
Date: Fri Jan 19 2018 - 21:06:36 EST


On Fri, Jan 19, 2018 at 02:53:25PM -0800, Linus Torvalds wrote:

> It would probably be good to add the size too, just to explain why
> it's potentially expensive.
>
> That said, apparently we do have hundreds of them, with just
> cpufreq_frequency_table having a ton. Maybe some are hidden in macros
> and removing one removes a lot.

cpufreq_table_find_index_...(), mostly.

> The real problem is that sometimes the subtraction is simply the right
> thing to do, and there's no sane way to say "yeah, this is one of
> those cases you shouldn't warn about".

FWIW, the sizes of the most common ones are
91 sizeof struct cpufreq_frequency_table = 12
Almost all of those come from
cpufreq_for_each_valid_entry(pos, table)
if (....)
return pos - table;
and I wonder if we would be better off with something like
#define cpufreq_for_each_valid_entry(pos, table, idx) \
for (pos = table, idx = 0; pos->frequency != CPUFREQ_TABLE_END; pos++, idx++) \
if (pos->frequency == CPUFREQ_ENTRY_INVALID) \
continue; \
else
so that those loops would become
cpufreq_for_each_valid_entry(pos, table, idx)
if (....)
return idx;
36 sizeof struct Indirect = 24
21 sizeof struct ips_scb = 216
18 sizeof struct runlist_element = 24
13 sizeof struct zone = 1728
Some are from
#define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
but there's
static inline int zone_id(const struct zone *zone)
{
struct pglist_data *pgdat = zone->zone_pgdat;

return zone - pgdat->node_zones;
}
and a couple of places where we have
for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
Those bloody well ought to be
for (zone = node_zones, end = node_zones + MAX_NR_ZONES; zone < end; zone++) {
13 sizeof struct vring = 40
11 sizeof struct usbhsh_device = 24
10 sizeof struct xpc_partition = 888
9 sizeof struct skge_element = 40
9 sizeof struct lock_class = 400
9 sizeof struct hstate = 29872
That little horror comes from get_hstate_idx() and hstate_index(). Code generated
for division is
movabsq $-5542915600080909725, %rax
sarq $4, %rdi
imulq %rdi, %rax
7 sizeof struct nvme_rdma_queue = 312
7 sizeof struct iso_context = 208
6 sizeof struct i915_power_well = 48
6 sizeof struct hpet_dev = 168
6 sizeof struct ext4_extent = 12
6 sizeof struct esas2r_target = 120
5 sizeof struct iio_chan_spec = 152
5 sizeof struct hwspinlock = 96
4 sizeof struct myri10ge_slice_state = 704
4 sizeof struct ext4_extent_idx = 12
Another interesting-looking one is struct vhost_net_virtqueue (18080 bytes)

Note that those sizes are rather sensitive to lockdep, spinlock debugging, etc.