On Fri, 2025-06-20 at 02:01 +0300, Nadav Amit wrote:
+
+static void set_payload(struct rar_payload *p, u16 pcid, unsigned
long start,
+ long pages)
+{
+ p->must_be_zero_1 = 0;
+ p->must_be_zero_2 = 0;
+ p->must_be_zero_3 = 0;
+ p->page_size = RAR_INVLPG_PAGE_SIZE_4K;
I think you can propagate the stride to this point instead of using
fixed 4KB stride.
Agreed. That's another future optimization, for
once this code all works right.
Currently I am in a situation where the receiving
CPU clears the action vector from RAR_PENDING to
RAR_SUCCESS, but the TLB does not appear to always
be correctly flushed.
+ p->type = RAR_TYPE_INVPCID;
+ p->pcid = pcid;
+ p->linear_address = start;
+
+ if (pcid) {
+ /* RAR invalidation of the mapping of a specific
process. */
+ if (pages < RAR_INVLPG_MAX_PAGES) {
+ p->num_pages = pages;
+ p->subtype = RAR_INVPCID_ADDR;
+ } else {
+ p->subtype = RAR_INVPCID_PCID;
I wonder whether it would be safer to set something to p->num_pages.
(then we can do it unconditionally)
We have a limited number of bits available for
p->num_pages. I'm not sure we want to try
writing a larger number than what fits in those
bits.
We need protection against two different things here.
+ }
+ } else {
+ /*
+ * Unfortunately RAR_INVPCID_ADDR excludes global
translations.
+ * Always do a full flush for kernel
invalidations.
+ */
+ p->subtype = RAR_INVPCID_ALL;
+ }
+
+ /* Ensure all writes are visible before the action entry
is set. */
+ smp_wmb();
Maybe you can drop the smp_wmb() here and instead change the
WRITE_ONCE() in set_action_entry() to smp_store_release() ? It should
have the same effect and I think would be cleaner and convey your
intent
better.
1) Receiving CPUs must see all the writes done by
the originating CPU before we send the RAR IPI.
2) Receiving CPUs must see all the writes done by
set_payload() before the write done by
set_action_entry(), in case another CPU sends
the RAR IPI before we do.
That other RAR IPI could even be sent between
when we write the payload, and when we write
the action entry. The receiving CPU could take
long enough processing other RAR payloads that
it can see our action entry after we write it.
Does removing the smp_wmb() still leave everything
safe in that scenario?