Re: [PATCH v5 4/5] dax: fix mapping lifetime handling, convert to __pfn_t + kmap_atomic_pfn_t()

From: Dan Williams
Date: Thu Aug 13 2015 - 11:21:09 EST


On Wed, Aug 12, 2015 at 11:26 PM, Boaz Harrosh <boaz@xxxxxxxxxxxxx> wrote:
> Boooo. Here this all set is a joke. The all "pmem disable vs still-in-use" argument is mute
> here below you have inserted a live, used for ever, pfn into a process vm without holding
> a map.

Careful, don't confuse "unbind" with "unplug". "Unbind" invalidates
the driver's mapping (ioremap) while "unplug" would invalidate the
pfn. DAX is indeed broken with respect to unplug and we'll need to go
solve that separately. I expect "unplug" support will be needed for
hot provisioning pmem to/from virtual machines.

> The all "pmem disable vs still-in-use" is a joke. The FS loaded has a reference on the bdev
> and the filehadle has a reference on the FS. So what is exactly this "pmem disable" you are
> talking about?

Hmm, that's not the same block layer I've been working with for the
past several years:

$ mount /dev/pmem0 /mnt
$ echo namespace0.0 > ../drivers/nd_pmem/unbind # succeeds

Unbind always proceeds unconditionally. See the recent kernel summit
topic discussion around devm vs unbind [1]. While kmap_atomic_pfn_t()
does not implement revoke semantics it at least forces re-validation
and time bounded references. For the unplug case we'll need to go
shootdown those DAX mappings in userspace so that they return SIGBUS
on access, or something along those lines.

[1]: http://www.spinics.net/lists/kernel/msg2032864.html

> And for god sake. I have a bdev I call bdev_direct_access(sector), the bdev calculated the
> exact address for me (base + sector). Now I get back this __pfn_t and I need to call
> kmap_atomic_pfn_t() which does a loop to search for my range and again base+offset ?
>
> This all model is broken, sorry?

I think you are confused about the lifetime of the userspace DAX
mapping vs the kernel's mapping and the frequency of calls to
kmap_atomic_pfn_t(). I'm sure you can make this loop look bad with a
micro-benchmark, but the whole point of DAX is to get the kernel out
of the I/O path, so I'm not sure this overhead shows up in any real
way in practice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/