Re: [PATCH v3 3/5] mm/memcg: Cache vmstat data in percpu memcg_stock_pcp

From: Waiman Long
Date: Thu Apr 15 2021 - 13:08:38 EST


On 4/15/21 12:50 PM, Johannes Weiner wrote:
On Tue, Apr 13, 2021 at 09:20:25PM -0400, Waiman Long wrote:
Before the new slab memory controller with per object byte charging,
charging and vmstat data update happen only when new slab pages are
allocated or freed. Now they are done with every kmem_cache_alloc()
and kmem_cache_free(). This causes additional overhead for workloads
that generate a lot of alloc and free calls.

The memcg_stock_pcp is used to cache byte charge for a specific
obj_cgroup to reduce that overhead. To further reducing it, this patch
makes the vmstat data cached in the memcg_stock_pcp structure as well
until it accumulates a page size worth of update or when other cached
data change.

On a 2-socket Cascade Lake server with instrumentation enabled and this
patch applied, it was found that about 17% (946796 out of 5515184) of the
time when __mod_obj_stock_state() is called leads to an actual call to
mod_objcg_state() after initial boot. When doing parallel kernel build,
the figure was about 16% (21894614 out of 139780628). So caching the
vmstat data reduces the number of calls to mod_objcg_state() by more
than 80%.
Right, but mod_objcg_state() is itself already percpu-cached. What's
the benefit of avoiding calls to it with another percpu cache?

There are actually 2 set of vmstat data that have to be updated. One is associated with the memcg and other one is for each lruvec within the cgroup. Caching it in obj_stock, we replace 2 writes to two colder cachelines with one write to a hot cacheline. If you look at patch 5, I break obj_stock into two - one for task context and one for irq context. Interrupt disable is no longer needed in task context, but that is not possible when writing to the actual vmstat data arrays.

Cheers,
Longman