Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing

From: Tomasz Nowicki
Date: Thu Oct 19 2017 - 01:19:19 EST


On 18.10.2017 19:30, Jeremy Linton wrote:
On 10/18/2017 05:24 AM, Tomasz Nowicki wrote:
On 18.10.2017 07:39, Tomasz Nowicki wrote:
Hi,

On 17.10.2017 17:22, Jeremy Linton wrote:
Hi,

On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
Hi Jeremy,

I did second round of review and have some more comments, please see below:

On 12.10.2017 21:48, Jeremy Linton wrote:
ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Further, report peers in the topology using setup_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@xxxxxxx>
---
 drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 485 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..c86715fed4a7
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,1 +1,485 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+ÂÂÂ struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+ÂÂÂ struct acpi_subtable_header *entry;
+
+ÂÂÂ /* there isn't a subtable at reference 0 */
+ÂÂÂ if (!pptt_ref)
+ÂÂÂÂÂÂÂ return NULL;
+
+ÂÂÂ if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+ÂÂÂÂÂÂÂ return NULL;
+
+ÂÂÂ entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+ÂÂÂ if (pptt_ref + entry->length > table_hdr->length)
+ÂÂÂÂÂÂÂ return NULL;
+
+ÂÂÂ return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+ÂÂÂ struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+ÂÂÂ return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+ÂÂÂ struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+ÂÂÂ return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+ÂÂÂ struct acpi_table_header *table_hdr,
+ÂÂÂ struct acpi_pptt_processor *node, int resource)
+{
+ÂÂÂ u32 ref;
+
+ÂÂÂ if (resource >= node->number_of_priv_resources)
+ÂÂÂÂÂÂÂ return NULL;
+
+ÂÂÂ ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+ÂÂÂÂÂÂÂÂÂÂÂÂÂ sizeof(u32) * resource);
+
+ÂÂÂ return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int local_level,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct acpi_subtable_header *res,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct acpi_pptt_cache **found,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int level, int type)
+{
+ÂÂÂ struct acpi_pptt_cache *cache;
+
+ÂÂÂ if (res->type != ACPI_PPTT_TYPE_CACHE)
+ÂÂÂÂÂÂÂ return 0;
+
+ÂÂÂ cache = (struct acpi_pptt_cache *) res;
+ÂÂÂ while (cache) {
+ÂÂÂÂÂÂÂ local_level++;
+
+ÂÂÂÂÂÂÂ if ((local_level == level) &&
+ÂÂÂÂÂÂÂÂÂÂÂ (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+ÂÂÂÂÂÂÂÂÂÂÂ ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {

Attributes have to be shifted:

(cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2

Hmmm, I'm not sure that is true, the top level function in this routine convert the "linux" constant to the ACPI version of that constant. In that case the "type" field is pre-shifted, so that it matches the result of just anding against the field... That is unless I messed something up, which I don't see at the moment (and the code of course has been tested with PPTT's from multiple people at this point).

For ThunderX2 I got lots of errors in dmesg:
Found duplicate cache level/type unable to determine uniqueness

So I fixed "type" macros definitions (without shifting) and shift it here which fixes the issue. As you said, it can be pre-shifted as well.

Ah, yah right... If you removed the shift per your original comment then it breaks this. Yes, and the type definitions for cache type aren't wrong in this version because the unified state has the 3rd bit set for both the 0x3 and 0x2 values and its only used to covert from the linux type to the ACPI type (and not back because we don't mess with whatever the original "detection" was). I'm not really planning on changing that because I don't think it helps "readability" (and it converts a compile time constant to a runtime shift).





+ÂÂÂÂÂÂÂÂÂÂÂ if (*found != NULL)
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pr_err("Found duplicate cache level/type unable to determine uniqueness\n");

Actually I still see this error messages in my dmesg. It is because the following ThunderX2 per-core L1 and L2 cache hierarchy:

Core
ÂÂ------------------
|ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
| L1i -----ÂÂÂÂÂÂÂ |
|ÂÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂ |
|ÂÂÂÂÂÂÂÂÂ ----L2Â |
|ÂÂÂÂÂÂÂÂ |ÂÂÂÂÂÂÂ |
| L1d -----ÂÂÂÂÂÂÂ |
|ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
ÂÂ------------------

In this case we have two paths which lead to L2 cache and hit above case. Is it really error case?

No, but its not deterministic unless we mark the node, which doesn't solve the problem of a table constructed like

L1i->L2 (unified)
L1d->L2 (unified)

or various other structures which aren't disallowed by the spec and have non-deterministic real world meanings, anymore than constructing the table like:

L1i
Lid->L2(unified)

which I tend to prefer because with a structuring like that it can be deterministic (and in a way actually represents the non-coherent behavior of (most?) ARM64 core's i-caches, as could be argued the first example if the allocation policies are varied between the L2 nodes).

The really ugly bits here happen if you add another layer:

L1i->L2i-L3
L1d------^

which is why I made that an error message, not including the fact that since the levels aren't tagged the numbering and meaning isn't clear.

(the L1i in the above example might be better called an L0i to avoid throwing off the reset of the hierarchy numbering, also so it could be ignored).

Summary:

I'm not at all happy with this specification's attempt to leave out pieces of information which make parsing things more deterministic. In this case I'm happy to demote the message level, but not remove it entirely but I do think the obvious case you list shouldn't be the default one.

Lastly:

I'm assuming the final result is that the table is actually being parsed correctly despite the ugly message?

Indeed, the ThunderX2 PPTT table is being parsed so that topology shown in lstopo and lscpu is correct.

Thanks,
Tomasz