Re: dm: dm-cache fails to write the cache device in writethrough mode
From: Mike Snitzer
Date: Sat Mar 23 2013 - 17:09:29 EST
On Sat, Mar 23 2013 at 1:15am -0400,
Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> On Fri, Mar 22, 2013 at 11:27:16PM -0400, Mike Snitzer wrote:
> > On Fri, Mar 22 2013 at 7:16pm -0400,
> > Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> > > On Fri, Mar 22, 2013 at 06:34:28PM -0400, Mike Snitzer wrote:
> > > > On Fri, Mar 22 2013 at 4:11pm -0400,
> > > > Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> > > >
> > > > > The new writethrough strategy for dm-cache issues a bio to the origin device,
> > > > > remaps the bio to the cache device, and issues the bio to the cache device.
> > > > > However, the block layer modifies bi_sector and bi_size, so we need to preserve
> > > > > these or else nothing gets written to the cache (bi_size == 0). This fixes the
> > > > > problem where someone writes a block through the cache, but a subsequent reread
> > > > > (from the cache) returns old contents.
> > > >
> > > > Your writethrough blkid test results are certainly strange. But I'm not
> > > > aware of where the block layer would modify bi_size and bi_sector;
> > > > please elaborate.
> > > >
> > > > I cannot reproduce your original report. I developed
> > > > 'test_writethrough_ext4_uuids_match', apologies for the ruby code:
> > >
> > > Hmm... I'm building my kernels off 0a7e453103b9718d357688b83bb968ee108cc874 in
> > > Linus' tree (post 3.9-rc3). This is the full output of dmsetup table:
> > >
> > > moocache-blocks: 0 1039360 linear 8:16 9088
> > > moocache-metadata: 0 8704 linear 8:16 384
> > > moocache: 0 67108864 cache 253:0 253:1 8:0 512 1 writethrough default 4 random_threshold 4 sequential_threshold 32768
> > >
> > > 253:0 -> moocache-metadata and 253:1 -> moocache-blocks.
> > >
> > > I'm curious what your setup is...
> > Here are the tables:
> > test-dev-238267: 0 8192 linear /dev/stec/metadata 0
> > test-dev-255913: 0 2097152 linear /dev/stec/metadata 8192
> > test-dev-655144: 0 20480 linear /dev/spindle/data 0
> > 0 20480 cache /dev/mapper/test-dev-238267 /dev/mapper/test-dev-255913 /dev/mapper/test-dev-655144 512 1 writethrough default 0
> > And I tweaked 'test_writethrough_ext4_uuids_match' to make sure to use the
> > same thresholds you're using, full status output:
> > 0 20480 cache 15/1024 0 19 0 0 0 0 0 0 1 writethrough 2 migration_threshold 32768 4 random_threshold 4 sequential_threshold 512
> > So the big difference is the thinp-test-suite uses intermediate linear
> > DM layers above the slower sd device (spindle/data) -- whereas in your
> > setup the origin device is direct to sd (8:0).
> > I'll re-run with the origin directly on sd in the morning and will
> > report back.
> Interesting ... if I set up this:
> # echo "0 67108864 linear /dev/sda 0" | dmsetup create origin
> And then repeat the test, but using /dev/mapper/origin as the origin instead
> of /dev/sda, the problem goes away.
Using the extra dm-linear layer is implicitly leveraging the DM core's
bio cloning to restore the original bio that was sent to the linear
But even after having changed my test to use /dev/sdb for the origin
device I cannot reproduce the problem you've reported. Do you have any
further details on how/why the bios are being altered? Are you
reliably hitting partial completions within the origin's driver? If so
Having looked at this for a bit it seems pretty clear writethrough_endio
is missing partial completion handling, e.g.:
if (!bio_flagged(bio, BIO_UPTODATE) && !err)
err = -EIO;
But I haven't yet come to terms with what the partial completion
handling implementation needs to be for the writethrough support.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/