3TB memory and high system load 80%+

From: jesper
Date: Fri Mar 13 2015 - 12:31:34 EST


Hi.

We have severe issues getting a Intel(R) Xeon(R) CPU E7-8857 v2 @ 3.00GHz
48 cores server to perform nice.

We have just setup a new server for a PostgreSQL database with 3TB of
memory, primarily for disk-io-caching. The server is running Ubuntu 12.04
on kernel 3.13.0-43-generic. The only change is that we have:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

We have "system load spikes" on it. So it occationally are peaking at 80-90%
system load for minutes and then migrating back to "normal". When it happens
all other activity suffers.

Before disabling hugepages, the system pretty much went to a complete
stall when memory pressure occoured.

sar reports
13:05:01 CPU %user %nice %system %iowait %steal
%idle
16:25:01 all 11.20 0.00 2.11 3.75 0.00
82.94
16:30:01 all 12.55 0.00 2.67 3.63 0.00
81.15
16:35:01 all 12.24 0.00 3.68 3.61 0.00
80.47
16:40:01 all 11.74 0.00 2.31 3.46 0.00
82.48
16:45:01 all 15.12 0.00 4.31 3.85 0.00
76.71
16:50:01 all 43.04 0.00 8.89 5.24 0.00
42.82
16:55:01 all 46.14 0.00 13.53 5.29 0.00
35.03
17:05:01 all 27.24 0.00 40.16 3.77 0.00
28.83
17:10:01 all 17.24 0.00 57.76 2.68 0.00
22.32
17:15:01 all 44.89 0.00 11.14 4.44 0.00
39.53
17:20:03 all 29.73 0.00 33.77 3.99 0.00
32.51

In above workload it is "only" running pg_dump and pg_restore with 2
instances of postgresql running. No network/nfs/etc at the same time and
not more than 48 processes active at the same time.

Any suggestions/ideas for how/what to improve are greatly welcome. Both in
terms of tunables and potential fixes in kernels.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/