Re: [PATCH 7/9] x86/intel_rdt: Implement scheduling support for Intel RDT

From: Andy Lutomirski
Date: Thu Aug 06 2015 - 19:03:21 EST


On 08/06/2015 02:55 PM, Vikas Shivappa wrote:
Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. For
Cache Allocation, MSR write would let the task fill in the cache
'subset' represented by the task's intel_rdt cgroup cache_mask.

The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
CLOSid. During context switch kernel implements this by writing the
CLOSid of the cgroup to which the task belongs to the CPU's
IA32_PQR_ASSOC MSR.

This patch also implements a common software cache for IA32_PQR_MSR
(RMID 0:9, CLOSId 32:63) to be used by both Cache monitoring (CMT) and
Cache allocation. CMT updates the RMID where as cache_alloc updates the
CLOSid in the software cache. During scheduling when the new RMID/CLOSid
value is different from the cached values, IA32_PQR_MSR is updated.
Since the measured rdmsr latency for IA32_PQR_MSR is very high (~250
cycles) this software cache is necessary to avoid reading the MSR to
compare the current CLOSid value.

The following considerations are done for the PQR MSR write so that it
minimally impacts scheduler hot path:
- This path does not exist on any non-intel platforms.
- On Intel platforms, this would not exist by default unless CGROUP_RDT
is enabled.
- remains a no-op when CGROUP_RDT is enabled and intel SKU does not
support the feature.
- When feature is available and enabled, never does MSR write till the
user manually creates a cgroup directory *and* assigns a cache_mask
different from root cgroup directory. Since the child node inherits
the parents cache mask, by cgroup creation there is no scheduling hot
path impact from the new cgroup.
- MSR write is only done when there is a task with different Closid is
scheduled on the CPU. Typically if the task groups are bound to be
scheduled on a set of CPUs, the number of MSR writes is greatly
reduced.
- A per CPU cache of CLOSids is maintained to do the check so that we
dont have to do a rdmsr which actually costs a lot of cycles.
- For cgroup directories having same cache_mask the CLOSids are reused.
This minimizes the number of CLOSids used and hence reduces the MSR
write frequency.

What happens if a user process sets a painfully restrictive CLOS and then spends most of its time in the kernel doing work on behalf of unrelated tasks? Does performance suck?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/