Re: [PATCH v2] mm, THP, swap: fix allocating cluster for swapfile by mistake

From: Matthew Wilcox
Date: Thu Aug 20 2020 - 07:38:04 EST


On Thu, Aug 20, 2020 at 12:53:23PM +0800, Gao Xiang wrote:
> SWP_FS is used to make swap_{read,write}page() go through
> the filesystem, and it's only used for swap files over
> NFS. So, !SWP_FS means non NFS for now, it could be either
> file backed or device backed. Something similar goes with
> legacy SWP_FILE.
>
> So in order to achieve the goal of the original patch,
> SWP_BLKDEV should be used instead.

This is clearly confusing. I think we need to rename SWP_FS to SWP_FS_OPS.

More generally, the swap code seems insane. I appreciate that it's an
inherited design from over twenty-five years ago, and nobody wants to
touch it, but it's crazy that it cares about how the filesystem has
mapped file blocks to disk blocks. I understand that the filesystem
has to know not to allocate memory in order to free memory, but this
is already something filesystems have to understand. It's also useful
for filesystems to know that this is data which has no meaning after a
power cycle (so it doesn't need to be journalled or snapshotted or ...),
but again, that's useful functionality which we could stand to present
to userspace anyway.

I suppose the tricky thing about it is that working on the swap code is
not as sexy as working on a filesystem, and doing the swap code right
is essentially writing a filesystem, so everybody who's capable already
has something better to do.

Anyway, Gao, please can you submit a follow-on patch to rename SWP_FS?