Re: [PATCH V5 00/10] AMD XDNA driver
From: Lizhi Hou
Date: Tue Oct 29 2024 - 11:25:10 EST
On 10/25/24 15:02, Jeffrey Hugo wrote:
On 10/25/2024 3:28 PM, Lizhi Hou wrote:
On 10/25/24 10:55, Jeffrey Hugo wrote:
On 10/21/2024 10:19 AM, Lizhi Hou wrote:
This patchset introduces a new Linux Kernel Driver, amdxdna for AMD
NPUs.
The driver is based on Linux accel subsystem.
NPU (Neural Processing Unit) is an AI inference accelerator integrated
into AMD client CPUs. NPU enables efficient execution of Machine
Learning
applications like CNNs, LLMs, etc. NPU is based on AMD XDNA
architecture [1].
AMD NPU consists of the following components:
- Tiled array of AMD AI Engine processors.
- Micro Controller which runs the NPU Firmware responsible for
command processing, AIE array configuration, and execution
management.
- PCI EP for host control of the NPU device.
- Interconnect for connecting the NPU components together.
- SRAM for use by the NPU Firmware.
- Address translation hardware for protected host memory access
by the
NPU.
NPU supports multiple concurrent fully isolated contexts. Concurrent
contexts may be bound to AI Engine array spatially and or temporarily.
The driver is licensed under GPL-2.0 except for UAPI header which is
licensed GPL-2.0 WITH Linux-syscall-note.
User mode driver stack consists of XRT [2] and AMD AIE Plugin for
IREE [3].
The firmware for the NPU is distributed as a closed source binary,
and has
already been pushed to the DRM firmware repository [4].
[1]https://www.amd.com/en/technologies/xdna.html
[2]https://github.com/Xilinx/XRT
[3]https://github.com/nod-ai/iree-amd-aie
[4]https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu
Changes since v4:
- Fix lockdep errors
- Use __u* structure for struct aie_error
One nit, when you send the next version would you please either To:
or Cc: me on the entire series? I only get pieces in my inbox which
is mildly annoying on my end.
Sure.
Looks like we are getting close here. One procedural question I
have, do you have commit permissions to drm-misc?
No, I do not have commit permissions yet.
You should apply for access. Assuming this series is ready before
that goes through, I'll apply it.
I applied the series to drm-misc-next and tried to build. Got the
following errors -
Could you share the build command line? So I can reproduce and verify
my fix.
The command is simple:
make -j20
The system details, incase it somehow matters:
Ubuntu 22.04 w/ 5.15 kernel
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
$ uname -a
Linux jhugo-lnx 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41
UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The kernel config is probably the relevant piece. When I first built
after applying the series, I was asked to choose what to do with the
new config item. I selected =m.
.config can be found at
https://gist.github.com/quic-jhugo/4cc249b1e3ba127039fbc709a513a432
I used "make M=drivers/accel/amdxdna" and did not reproduce the error
with drm-misc-next. It looks build robot did not complain with the
patch neither.
$ git branch
* drm-misc-next
$ make M=drivers/accel/amdxdna
CC [M] drivers/accel/amdxdna/aie2_ctx.o
CC [M] drivers/accel/amdxdna/aie2_error.o
CC [M] drivers/accel/amdxdna/aie2_message.o
CC [M] drivers/accel/amdxdna/aie2_pci.o
CC [M] drivers/accel/amdxdna/aie2_psp.o
CC [M] drivers/accel/amdxdna/aie2_smu.o
CC [M] drivers/accel/amdxdna/aie2_solver.o
CC [M] drivers/accel/amdxdna/amdxdna_ctx.o
CC [M] drivers/accel/amdxdna/amdxdna_gem.o
CC [M] drivers/accel/amdxdna/amdxdna_mailbox.o
CC [M] drivers/accel/amdxdna/amdxdna_mailbox_helper.o
CC [M] drivers/accel/amdxdna/amdxdna_pci_drv.o
CC [M] drivers/accel/amdxdna/amdxdna_sysfs.o
CC [M] drivers/accel/amdxdna/npu1_regs.o
CC [M] drivers/accel/amdxdna/npu2_regs.o
CC [M] drivers/accel/amdxdna/npu4_regs.o
CC [M] drivers/accel/amdxdna/npu5_regs.o
LD [M] drivers/accel/amdxdna/amdxdna.o
MODPOST drivers/accel/amdxdna/Module.symvers
CC [M] drivers/accel/amdxdna/amdxdna.mod.o
CC [M] drivers/accel/amdxdna/.module-common.o
LD [M] drivers/accel/amdxdna/amdxdna.ko
$
CC [M] drivers/accel/amdxdna/aie2_ctx.o
CC [M] drivers/accel/amdxdna/aie2_error.o
CC [M] drivers/accel/amdxdna/aie2_message.o
CC [M] drivers/accel/amdxdna/aie2_pci.o
CC [M] drivers/accel/amdxdna/aie2_psp.o
CC [M] drivers/accel/amdxdna/aie2_smu.o
CC [M] drivers/accel/amdxdna/aie2_solver.o
CC [M] drivers/accel/amdxdna/amdxdna_ctx.o
CC [M] drivers/accel/amdxdna/amdxdna_gem.o
CC [M] drivers/accel/amdxdna/amdxdna_mailbox.o
CC [M] drivers/accel/amdxdna/amdxdna_mailbox_helper.o
CC [M] drivers/accel/amdxdna/amdxdna_pci_drv.o
CC [M] drivers/accel/amdxdna/amdxdna_sysfs.o
CC [M] drivers/accel/amdxdna/npu1_regs.o
CC [M] drivers/accel/amdxdna/npu2_regs.o
CC [M] drivers/accel/amdxdna/npu4_regs.o
CC [M] drivers/accel/amdxdna/npu5_regs.o
AR drivers/base/firmware_loader/built-in.a
AR drivers/base/built-in.a
In file included from drivers/accel/amdxdna/aie2_message.c:19:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
declaration of function ‘FIELD_GET’
[-Werror=implicit-function-declaration]
112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
| ^~~~~~~~~
In file included from drivers/accel/amdxdna/amdxdna_gem.c:15:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
declaration of function ‘FIELD_GET’
[-Werror=implicit-function-declaration]
112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
| ^~~~~~~~~
In file included from drivers/accel/amdxdna/aie2_psp.c:11:
drivers/accel/amdxdna/aie2_psp.c: In function ‘psp_exec’:
drivers/accel/amdxdna/aie2_psp.c:62:34: error: implicit declaration
of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
62 | FIELD_GET(PSP_STATUS_READY, ready),
| ^~~~~~~~~
./include/linux/iopoll.h:47:21: note: in definition of macro
‘read_poll_timeout’
47 | if (cond) \
| ^~~~
drivers/accel/amdxdna/aie2_psp.c:61:15: note: in expansion of macro
‘readx_poll_timeout’
61 | ret = readx_poll_timeout(readl, PSP_REG(psp,
PSP_STATUS_REG), ready,
| ^~~~~~~~~~~~~~~~~~
drivers/accel/amdxdna/amdxdna_ctx.h: In function
‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
declaration of function ‘FIELD_PREP’
[-Werror=implicit-function-declaration]
121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
| ^~~~~~~~~~
drivers/accel/amdxdna/amdxdna_ctx.h: In function
‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
declaration of function ‘FIELD_PREP’
[-Werror=implicit-function-declaration]
121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
| ^~~~~~~~~~
In file included from drivers/accel/amdxdna/aie2_pci.c:22:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
declaration of function ‘FIELD_GET’
[-Werror=implicit-function-declaration]
112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
| ^~~~~~~~~
In file included from drivers/accel/amdxdna/aie2_ctx.c:18:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
declaration of function ‘FIELD_GET’
[-Werror=implicit-function-declaration]
112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
| ^~~~~~~~~
drivers/accel/amdxdna/amdxdna_ctx.h: In function
‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
declaration of function ‘FIELD_PREP’
[-Werror=implicit-function-declaration]
121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
| ^~~~~~~~~~
In file included from drivers/accel/amdxdna/amdxdna_ctx.c:16:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
declaration of function ‘FIELD_GET’
[-Werror=implicit-function-declaration]
112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
| ^~~~~~~~~
cc1: all warnings being treated as errors
drivers/accel/amdxdna/amdxdna_ctx.h: In function
‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
declaration of function ‘FIELD_PREP’
[-Werror=implicit-function-declaration]
121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
| ^~~~~~~~~~
drivers/accel/amdxdna/aie2_ctx.c: In function ‘aie2_hwctx_restart’:
drivers/accel/amdxdna/aie2_ctx.c:114:9: error: too few arguments to
function ‘drm_sched_start’
114 | drm_sched_start(&hwctx->priv->sched);
| ^~~~~~~~~~~~~~~
In file included from ./include/trace/events/amdxdna.h:12,
from drivers/accel/amdxdna/aie2_ctx.c:13:
./include/drm/gpu_scheduler.h:593:6: note: declared here
593 | void drm_sched_start(struct drm_gpu_scheduler *sched, int
errno);
| ^~~~~~~~~~~~~~~
make[5]: *** [scripts/Makefile.build:229:
drivers/accel/amdxdna/aie2_psp.o] Error 1
make[5]: *** Waiting for unfinished jobs....
drivers/accel/amdxdna/amdxdna_ctx.h: In function
‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
declaration of function ‘FIELD_PREP’
[-Werror=implicit-function-declaration]
121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
| ^~~~~~~~~~
In file included from drivers/accel/amdxdna/amdxdna_pci_drv.c:18:
drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
declaration of function ‘FIELD_GET’
[-Werror=implicit-function-declaration]
112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
| ^~~~~~~~~
cc1: all warnings being treated as errors
make[5]: *** [scripts/Makefile.build:229:
drivers/accel/amdxdna/aie2_ctx.o] Error 1
drivers/accel/amdxdna/amdxdna_ctx.h: In function
‘amdxdna_cmd_set_state’:
drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
declaration of function ‘FIELD_PREP’
[-Werror=implicit-function-declaration]
121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
| ^~~~~~~~~~
drivers/accel/amdxdna/amdxdna_mailbox.c: In function
‘xdna_mailbox_send_msg’:
drivers/accel/amdxdna/amdxdna_mailbox.c:444:26: error: implicit
declaration of function ‘FIELD_PREP’
[-Werror=implicit-function-declaration]
444 | header->sz_ver = FIELD_PREP(MSG_BODY_SZ,
msg->send_size) |
| ^~~~~~~~~~
You also have the following checkpatch issues -
Could you share the command you used? I tried to use 'dim
checkpatch' and it did not find out the misspelling issue.
./scripts/checkpatch.pl --strict --codespell *.patch
Note, --codespell requires some local setup. I beleive the comments
in the checkpatch.pl script are fairly straightforward. I use a copy
of the database from the github that is rather recent. The Ubuntu
distro package is really out of date and I don't think I looked to see
if there is a pythong pip version. Grabbing the one file from the
github repo seemed simple emough.
I was able to reproduce with your suggestions. Thanks a lot.
Lizhi
-Jeff