Re: [PATCH v6 17/17] powerpc/vas: Document FTW API/usage

From: Michael Neuling
Date: Mon Aug 14 2017 - 06:43:37 EST


On Tue, 2017-08-08 at 16:07 -0700, Sukadev Bhattiprolu wrote:
> Document the usage of the VAS Fast thread-wakeup API.
>
> Thanks for input/comments from Benjamin Herrenschmidt, Michael Neuling,
> Michael Ellerman, Robert Blackmore, Ian Munsie, Haren Myneni, Paul Mackerras.
>
> Cc:Ian Munsie <imunsie@xxxxxxxxxxx>
> Cc:Paul Mackerras <paulus@xxxxxxxxxx>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx>
> ---
> ÂDocumentation/powerpc/ftw-api.txt | 373
> ++++++++++++++++++++++++++++++++++++++
> Â1 file changed, 373 insertions(+)
> Âcreate mode 100644 Documentation/powerpc/ftw-api.txt
>
> diff --git a/Documentation/powerpc/ftw-api.txt b/Documentation/powerpc/ftw-
> api.txt
> new file mode 100644
> index 0000000..0b3f16f
> --- /dev/null
> +++ b/Documentation/powerpc/ftw-api.txt
> @@ -0,0 +1,373 @@
> +Virtual Accelerator Switchboard and Fast Thread-Wakeup API
> +
> +ÂÂÂÂPower9 processor supports a hardware subystem known as the Virtual
> +ÂÂÂÂAccelerator Switchboard (VAS) which allows two entities in the Power9
> +ÂÂÂÂsystem to efficiently exchange messages. Messages must be formatted as
> +ÂÂÂÂCoprocessor Reqeust Blocks (CRB) and be submitted using the COPY/PASTE
> +ÂÂÂÂinstructions (new in Power9).
> +
> +ÂÂÂÂUsage of VAS depends on the entities exchanging the messages and
> +ÂÂÂÂcurrently two usages have been identified.
> +
> +ÂÂÂÂFirst usage of VAS, referred to as VAS/NX involves a software thread
> +ÂÂÂÂsubmitting data compression requests to a co-processor (hardware/nest
> +ÂÂÂÂaccelerator) aka NX engine. The API for this usage is described in the
> +ÂÂÂÂVAS/NX API document.
> +
> +ÂÂÂÂAlternatively, VAS can be used by two software threads to efficiently
> +ÂÂÂÂexchange messages. Initially, this mechanism is intended to wake up a
> +ÂÂÂÂwaiting thread quickly - i.e "fast thread wake-up (FTW)". This document
> +ÂÂÂÂdescribes the user API for this VAS/FTW mechanism.
> +
> +ÂÂÂÂApplication access to the FTW mechanism is provided through the NX-FTW
> +ÂÂÂÂdevice node (/dev/crypto/nx-ftw) implemented by the VAS/FTW device
> +ÂÂÂÂdriver.

crypto?

> +
> +ÂÂÂÂA software thread T1 that intends to wait for an event must first setup
> +ÂÂÂÂa receive window, by opening the NX-FTW device and using the
> +ÂÂÂÂVAS_RX_WIN_OPEN ioctl. Upon successful return from the VAS_RX_WIN_OPEN
> +ÂÂÂÂioctl, an rx_win_handle is returned.

I realise there is a window here as part of the hardware implementation, but the
users don't care about the window on the receive side. It's hidden from them.
It's just an rx handle IMHO.

The sender certainly has a window that users care about since they have to mmap
it.

> +
> +ÂÂÂÂA software thread T2 that intends to wake up T1 at some point, must first
> +ÂÂÂÂset up a "send window" using the VAS_TX_WIN_OPEN ioctl and specify the
> +ÂÂÂÂrx_win_handle obtained by T1. After a successful VAS_TX_WIN_OPEN ioctl
> the
> +ÂÂÂÂsend window of T2 is considered paired with the receive window of T1. The
> +ÂÂÂÂthread T2 must then use mmap() to obtain a "paste address" for the send
> +ÂÂÂÂwindow.


> +ÂÂÂÂWith this set up, thread T1 can wait for an event using the WAIT
> +ÂÂÂÂinstruction.
> +
> +ÂÂÂÂThread T2 can wake up T1 by using the "COPY/PASTE" instructions and
> +ÂÂÂÂsubmitting an empty/NULL CRB to the send window's paste address. The
> +ÂÂÂÂwait/wake up process can be repeated as long as the threads have the
> +ÂÂÂÂsend/receive windows open.



> +1. NX-FTW Device Node
> +
> +ÂÂÂÂThere is one /dev/crypto/nx-ftw node in the system and it provides
> +ÂÂÂÂaccess to the VAS/FTW functionality.


> +ÂÂÂÂThe only valid operations on the NX-FTW node are:
> +
> +ÂÂÂÂÂÂÂÂ- open() the device for read and write.
> +
> +ÂÂÂÂÂÂÂÂ- issue either VAS_RX_WIN_OPEN or VAS_TX_WIN_OPEN ioctls to set up
> +ÂÂÂÂÂÂÂÂÂÂreceive or send (only one of them per open).
> +
> +ÂÂÂÂÂÂÂÂ- if the open is associated with send window (i.e VAS_TX_WIN_OPEN
> +ÂÂÂÂÂÂÂÂÂÂioctl was issued) mmap() the send window into the application's
> +ÂÂÂÂÂÂÂÂÂÂvirtual address space. (i.e get a 'paste_address' for the send
> +ÂÂÂÂÂÂÂÂÂÂwindow).
> +
> +ÂÂÂÂÂÂÂÂ- close the device node.
> +
> +ÂÂÂÂOther file operations on the NX-FTW node are undefined.
> +
> +ÂÂÂÂNote tHAT the COPY and PASTE operations go directly to the hardware
> +ÂÂÂÂand not go through the NX-FTW device.

I don't understand this statement

> +
> +ÂÂÂÂAlthough a system may have several instances of the VAS in the system
> +ÂÂÂÂ(typically, one per P9 chip) there is just one NX-FTW device node in
> +ÂÂÂÂthe system.

> + When the NX-FTW device node is opened, the kernel assigns a suitable
> + instance of VAS to the process. Kernel will make a best-effort
> attempt
> + to assign an optimal instance of VAS for the process. In the initial
> +ÂÂÂÂrelease, the kernel does not support migrating the VAS instance if the
> +ÂÂÂÂprocess migrates from a processor on one chip to a processor on another
> +ÂÂÂÂchip.

How is it "optimal"?

> +ÂÂÂÂApplications may chose a specific instance of the VAS using the 'vas_id'
> +ÂÂÂÂfield in the VAS_TX_WIN_OPEN and VAS_RX_WIN_OPEN ioctls as detailed
> below.




> +2. Open NX-FTW node
> +
> +ÂÂÂÂThe device should be opened for read and write. No special privileges
> +ÂÂÂÂare needed to open the device. The device may be opened multiple times.
> +
> +ÂÂÂÂEach open() of the NX-FTW device may be associated with either a send
> +ÂÂÂÂwindow or receive window but not both.
> +
> +ÂÂÂÂSee open(2) system call man pages for other details such as return
> +ÂÂÂÂvalues, error codes and restrictions.
> +
> +3. Setup Receive window (VAS_RX_WIN_OPEN ioctl)
> +
> +ÂÂÂÂA thread that expects to wait for events and be woken up using COPY/PASTE
> +ÂÂÂÂmust first set up a receive window by issuing the VAS_RX_WIN_OPEN ioctl.
> +
> +ÂÂÂÂÂÂÂÂ#include <asm/vas.h>
> +
> +ÂÂÂÂÂÂÂÂstruct vas_rx_win_open_attr rxattr;
> +
> +ÂÂÂÂÂÂÂÂrc = ioctl(fd, VAS_RX_WIN_OPEN, &rxattr);
> +
> +ÂÂÂÂThe attributes of rxattr are as follows:
> +
> +ÂÂÂÂÂÂÂÂstruct vas_rx_win_open_attr {
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂint16_tÂÂÂÂÂÂÂversion;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂint16_tÂÂÂÂÂÂÂvas_id;
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂint32_tÂÂÂÂÂÂÂrx_win_handle;ÂÂÂÂ/* output field */
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂint64_tÂÂÂÂÂÂÂreserved[8];
> +ÂÂÂÂÂÂÂÂ};
> +
> +ÂÂÂÂThe version field identifies the version of the API and must currently
> +ÂÂÂÂbe set to 1.
> +
> +ÂÂÂÂThe vas_id field identifies a specific instance of the VAS that the
> +ÂÂÂÂapplication wishes to access. See section on VAS ID below.
> +
> +ÂÂÂÂThe reserved field must be set to all zeroes.
> +
> +ÂÂÂÂUpon successful return from the ioctl, the rx_win_handle field contains
> +ÂÂÂÂan identifier for the VAS window associated with this "sleeping" thread.
> +
> +ÂÂÂÂThis rx_win_handle field is used to "pair" this receive window with a
> +ÂÂÂÂsend window and must be specified when opening the corresponding send
> +ÂÂÂÂwindow (see struct vas_tx_win_open_attr below).
> +
> +ÂÂÂÂReturn value:
> +
> +ÂÂÂÂThe VAS_RX_WIN_OPEN ioctl returns 0 on success. On error, it returns -1
> +ÂÂÂÂand sets the errno variable to indicate the error.
> +
> +ÂÂÂÂError codes:
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂversion is invalid
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂvas_id is invalid
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂreserved field is not set to zeroes
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂfd is already associated with a send window
> +
> +
> +3. Set up a Send window (VAS_TX_WIN_OPEN ioctl)
> +
> +ÂÂÂÂAn application thread that expects to wake up a waiting thread using
> +ÂÂÂÂcopy/paste, must first set up a send window that is paired with the
> +ÂÂÂÂreceive window of the waiting thread. This is accomplished using the
> +ÂÂÂÂVAS_TX_WIN_OPEN ioctl.
> +
> +ÂÂÂÂÂÂÂÂ#include <asm/vas.h>
> +
> +ÂÂÂÂÂÂÂÂstruct vas_tx_win_open_attr txattr;
> +
> +ÂÂÂÂÂÂÂÂrc = ioctl(fd, VAS_TX_WIN_OPEN, &txattr);

So we talked about this offline before.... the fd here should not be from the
/dev device but should be the fd from rx_win_open ioctl.

As you have it here you pass the handle in as a parameter of ioctl. This means
all the permissions checks have to be done by you as to if these two windows can
be linked. If you use the fd from before, you can assume if the receiver has
given this fd to the sender, it has the right permissions.

I have some pseudo code at the end shows this.

> +ÂÂÂÂThe attributes 'txattr' for the VAS_TX_WIN_OPEN ioctl are defined as
> +ÂÂÂÂfollows:
> +
> +ÂÂÂÂÂÂÂÂstruct vas_tx_win_open_attr {
> +ÂÂÂÂÂÂÂÂÂÂÂÂint32_tÂÂÂÂÂÂÂversion;
> +ÂÂÂÂÂÂÂÂÂÂÂÂint16_tÂÂÂÂÂÂÂvas_id;
> +ÂÂÂÂÂÂÂÂÂÂÂÂuint32_tÂÂÂÂÂÂrx_win_handle;
> +
> +ÂÂÂÂÂÂÂÂÂÂÂÂint64_tÂÂÂÂÂÂÂreserved1;
> +
> +ÂÂÂÂÂÂÂÂÂÂÂÂint64_tÂÂÂÂÂÂÂflags;
> +ÂÂÂÂÂÂÂÂÂÂÂÂint64_tÂÂÂÂÂÂÂreserved2;
> +
> +ÂÂÂÂÂÂÂÂÂÂÂÂint32_tÂÂÂÂÂÂÂtc_mode;
> +ÂÂÂÂÂÂÂÂÂÂÂÂint32_tÂÂÂÂÂÂÂrsvd_txbuf;
> +ÂÂÂÂÂÂÂÂÂÂÂÂint64_tÂÂÂÂÂÂÂreserved3[6];
> +ÂÂÂÂÂÂÂÂ};
> +
> +ÂÂÂÂThe version field must currently be set to 1.
> +
> +ÂÂÂÂThe vas_id field identifies a specific instance of the VAS that the
> +ÂÂÂÂapplication wishes to access. See section on VAS ID below.

Can this be different to the rx?

> +ÂÂÂÂThe rx_win_handle field must be set to the rx_win_handle returned by
> +ÂÂÂÂa prior successful call to VAS_RX_WIN_OPEN ioctl (see above). This
> +ÂÂÂÂfield is used to pair this send window with a receive window. The
> +ÂÂÂÂprocess must have sufficient permissions to communicate with the
> +ÂÂÂÂprocess owning the receive window identified by rx_win_handle.

As above, this should be part of the FD otherwise users could specify anything
here and paste to anyone.

> +ÂÂÂÂThe tc_mode andÂÂrsvd_txbuf fields are currently unused and must be
> +ÂÂÂÂset to 0
> +
> +ÂÂÂÂThe flags field specifies additional attributes to the window. The
> +ÂÂÂÂonly valid bit in the flag are for FTW windows is:
> +
> +ÂÂÂÂÂÂÂÂVAS_FLAGS_PIN_WINDOWÂÂÂÂif set, indicates the a window should be
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂpinned in cache. This flag is restricted
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂto privileged users. See Pinning windows
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbelow.
> +
> +ÂÂÂÂAll the other bits in the flags field must be set to 0.
> +
> +ÂÂÂÂThe fields reserved1, reserved2 and reserved3 are for future extension
> +ÂÂÂÂand must be set to 0.
> +
> +ÂÂÂÂReturn value:
> +
> +ÂÂÂÂThe VAS_TX_WIN_OPEN ioctl returns 0 on success. On error, it returns -1
> +ÂÂÂÂand sets the errno variable to indicate the error.
> +
> +ÂÂÂÂError conditions:
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂversion, vas_id or rx_win_handle fields are invalid
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂfd does not refer to a valid VAS device.
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂfd is already associated with a receive window
> +
> +ÂÂÂÂÂÂÂÂENOSPCÂÂÂÂÂÂSystem has too many active windows (connections) open,
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂFor FTW windows, rsvd_txbuf is not 0.
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂFor FTW windows, tc_mode is not VAS_THRESH_DISABLED.
> +
> +ÂÂÂÂÂÂÂÂEPERMÂÂÂÂÂÂÂVAS_FLAGS_PIN_WINDOW is set in 'flags' field and process
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂis not privileged.
> +
> +ÂÂÂÂÂÂÂÂEPERMÂÂÂÂÂÂÂVAS_FLAGS_HIGH_PRI is set in 'flags' field and process
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂis not privileged.
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂan invalid flag is set in the 'flags' field. (For FTW
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂwindows, VAS_FLAGS_HIGH_PRI is also invalid).
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂreserved fields are not set to 0.
> +
> +ÂÂÂÂSee the ioctl(2) man page for more details, error codes and restrictions.
> +
> +4. mmap() NX-FTW device fd
> +
> +ÂÂÂÂThe mmap() system call for a NX-FTW device fd returns a "paste address"
> +ÂÂÂÂthat the application can use to COPY/PASTE a CRB to the waiting thread.
> +
> +ÂÂÂÂÂÂÂÂpaste_addr = mmap(NULL, size, prot, flags, fd, offset);
> +
> +ÂÂÂÂThe mmap() operation is only valid on a file descriptor associated
> +ÂÂÂÂwith a send window.
> +
> +ÂÂÂÂOnly restrictions on mmap for a NX-FTW device fd are:
> +
> +ÂÂÂÂÂÂÂÂ- size parameter should be one page size
> +
> +ÂÂÂÂÂÂÂÂ- offset parameter should be 0ULL.
> +
> +ÂÂÂÂRefer to mmap(2) man page for additional details/restrictions.
> +
> +ÂÂÂÂIn addition to the error conditions listed on the mmap(2) man page,
> +ÂÂÂÂmmap() can also fail with one of following error codes:
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂfd is not associated with an open send window (i.e mmap()
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂdoes not follow a successful call to the VAS_TX_WIN_OPEN
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂioctl).
> +
> +ÂÂÂÂÂÂÂÂEINVALÂÂÂÂÂÂoffset field is not 0ULL.
> +
> +
> +5. VAS ID
> +
> +ÂÂÂÂA system may have several instances of VAS in the hardware, typically
> +ÂÂÂÂone per POWER 9 chip. The choice of a specific instance of VAS can have
> +ÂÂÂÂsignificant impact on the performance, specially if the application
> +ÂÂÂÂmigrates from one CPU to another. Applications can specify a vas_id
> +ÂÂÂÂusing the VAS_TX_WIN_OPEN and VAS_RX_WIN_OPEN ioctls and should be
> +ÂÂÂÂprudent in choosing an instance of VAS.
> +
> +ÂÂÂÂThe vas_id for each instance of VAS is listed as the device tree
> +ÂÂÂÂproperty 'ibm,vas-id'. Determining the specific vas_id to use for
> +ÂÂÂÂa specific application thread is beyond the scope of this API.

I would lean towards having 1 device per vas/chip but I'll defer to mpe and benh
on the best option here.

you planning a libftw to do this?

> +
> +ÂÂÂÂIf the application has no preference, the vas_id field may be set to
> +ÂÂÂÂ-1 and the kernel will choose a suitable instance of the VAS engine.

+1

> +6. COPY/PASTE operations:
> +
> +ÂÂÂÂApplications should use the COPY and PASTE instructions defined in
> +ÂÂÂÂthe RFC to copy/paste the CRB. For VAS/FTW usage, the contents of
> +ÂÂÂÂCRB if any, are ignored. CRB can be NULL.
> +
> +7. Interrupt completion and signal handling
> +
> +ÂÂÂÂNo VAS-specific signals will be generated to the application threads
> +ÂÂÂÂwith the VAS/FTW usage.

+1

> +
> +
> +8. Example/Proposed usage of the VAS/FTW API
> +
> +ÂÂÂÂIn the following example we use two threads that use the VAS/FTW API.
> +ÂÂÂÂThread T1 uses the WAIT instruction to wait for an event. Thread T2
> +ÂÂÂÂuses copy/paste instructions to wake up T1.

So here's how pseudo code for my idea would look with pthreads. Â

I've also added some memory barriers. The ISA suggests that copy/paste has no
ordering associated with it, so you are going to need them I think. I'm not sure
of the flavour though.

---
bool done = false;
int rxfd;

static void reciever(void)
{
do {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂasm("wait");
smp_mb(); /* needed for wait -> memoryÂÂ*/
} while (!done); /* check for spurious wakeup */
/* woken up! */
}

static void sender(void)
{
void *paste_addr;

/* mmap the rx file descriptor */
paste_addr = mmap(NULL, getpagesize(), prot, MAP_SHARED, rxfd, 0);

done = true;
smp_mb(); /* needed for memory -> paste */
ÂÂÂÂÂÂÂÂwrite_crb(paste_addr);
}

int main()
{
pthread_t thread;
int devfd;

ÂÂÂÂÂÂÂÂdevfd = open("/dev/vas-ftw", O_RDWR);

/* create a new rx file descriptor associated with this LPID/PID/TID */
ÂÂÂÂÂÂÂÂrxfd = ioctl(devfd, VAS_RX_CREATE);

pthread_create(&thread, NULL, sender, NULL);

/* Reciever must *not* be a new thread since VAS_RX_CREATE
ÂÂÂioctl is associated with this LPID/PID/TIDÂ
*/
reciever();
}