regression introduced by "block: Add support for DAX reads/writes to block devices"

From: Jeff Moyer
Date: Wed Aug 05 2015 - 16:19:18 EST


Hi, Matthew,

Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs:

# mkfs -t xfs -f /dev/pmem0
meta-data=/dev/pmem0 isize=256 agcount=4, agsize=524288 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=2097152, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mkfs.xfs: read failed: Numerical result out of range

I sat down with Linda to look into it, and the problem is that mkfs.xfs
sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
from the last sector of the device. This results in dax_io trying to do
a page-sized I/O at 512 bytes from the end of the device.
bdev_direct_access, receiving this bogus pos/size combo, returns
-ERANGE:

if ((sector + DIV_ROUND_UP(size, 512)) >
part_nr_sects_read(bdev->bd_part))
return -ERANGE;

Given that file systems supporting dax refuse to mount with a blocksize
!= page size, I'm guessing this is sort of expected behavior. However,
we really shouldn't be breaking direct I/O on pmem devices.

So, what do you want to do? We could make the pmem device's logical
block size fixed at the sytem page size. Or, we could modify the dax
code to work with blocksize < pagesize. Or, we could continue using the
direct I/O codepath for direct block device access. What do you think?

Thaks,
Jeff and Linda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/