Re: [PATCH] swap: Fix swap size in case of block devices

From: Hugh Dickins
Date: Tue Sep 01 2009 - 05:24:29 EST

On Tue, 1 Sep 2009, Nitin Gupta wrote:
> On Tue, Sep 1, 2009 at 12:56 AM, Hugh Dickins<hugh.dickins@xxxxxxxxxxxxx> wrote:
> > On Mon, 31 Aug 2009, Nitin Gupta wrote:
> >> For block devices, setup_swap_extents() leaves p->pages untouched.
> >> For regular files, it sets p->pages
> >> Â Â Â == total usable swap pages (including header page) - 1;
> >
> > I think you're overlooking the "page < sis->max" condition
> > in setup_swap_extents()'s loop. ÂSo at the end of the loop,
> > if no pages were lost to fragmentation, we have
> >
> > Â Â Â Â Â Â Â Âsis->max = page_no; Â Â Â Â Â Â /* no change */
> > Â Â Â Â Â Â Â Âsis->pages = page_no - 1; Â Â Â /* no change */
> >
> Oh, I missed this loop condition. The variable naming is so bad, I
> find it very hard to follow this part of code.
> Still, if there is even a single page in swap file that is not usable
> (i.e. non-contiguous on disk) -- which is what usually happens for swap
> files of any practical size -- setup_swap_extents() gives correct value
> in sis->pages == total usable pages (including header) - 1;
> However, if all the file pages are usable, it gives off-by-one error, as
> you noted.

Right, I see your point now: when the regular file is fragmented thus,
setup_swap_extents() would allow it to use the final page of the file,
which would otherwise be (erroneously) disallowed.

But I would reword your "what usually happens" to "what happens in
the general case": perhaps I'm wrong, but I think that usually these
days people are creating swap files on filesystems with 4kB block
size, where there's no issue of intra-page fragmentation lowering
that page count (but there may still be inter-page fragmentation
to make swapping to the file less efficient than to a partition).

> > Yes, I'd dislike that discrepancy between regular files and block
> > devices, if I could see it. Though I'd probably still be cautious
> > about the disk partitions.
> > dd if=/dev/zero of=/swap bs=200k # says 204800 bytes (205kB)
> > mkswap /swap # says size = 196 KiB
> > swapon /swap # dmesg says Adding 192k swap
> > which is what I've come to expect from the off-by-one,
> > even on regular files.
> In general, its not correct to compare size repored by mkswap and
> swapon like this. The size reported by mkswap includes pages which
> are not contiguous on disk. While, kernel considers only
> PAGE_SIZE-length, PAGE_SIZE-aligned contiguous run of blocks. So, size
> reported by mkswap and swapon can vary wildly. For e.g.:
> (on mtdram with ext2 fs)
> dd if=/dev/zero of=swap.dd bs=1M count=10
> mkswap swap.dd # says size = 10236 KiB
> swapon swap.dd # says Adding 10112k swap

If the filesystem has block size 1kB or 2kB, yes.

> ====
> So, to summarize:
> 1. mkswap always behaves correctly: It sets number of pages in swap file
> minus one as 'last_page' in swap header (since this is a 0-based index).
> This same value (total pages - 1) is printed out as size since it knows
> that first page is swap header.
> 2. swapon() for block devices: off-by-one error causing last swap page
> to remain unused.
> 3. swapon() for regular files:
> 3.1 off-by-one error if every swap page in this file is usable i.e.
> every PAGE_SIZE-length, PAGE_SIZE-aligned chunk is contiguous on
> disk.
> 3.2 correct size value if there is at least one swap page which is
> unusable -- which is expected from swap file of any practical
> size.
> I will go through swap code again to find other possible off-by-one
> errors. The revised patch will fix these inconsistencies.