Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race

From: Li Wang
Date: Fri Dec 27 2013 - 22:51:33 EST


Hi Milosz,
As far as I know, logically, currently fscache does not play
as write cache for Ceph, except that there is a
call to ceph_readpage_to_fscache() in ceph_writepage(), but that
is nothing related to our test case. According to our observation,
our test case never goes through ceph_writepage(), instead, it goes
through ceph_writepages(). So in other words, I donot think this
is related to caching in write path.
May I try to explain the panic in more detail,

(1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
(2) echo 3 > /proc/sys/vm/drop_caches
(3) dd if=cephfs/foo of=/dev/null bs=8 count=1024

For statement (1), it is frequently appending a file, so
ceph_aio_write() frequently updates the inode->i_size,
however, these updates did not immediately reflected to
object->store_limit_l. For statement (3), when we
start reading the second page at [4096, 8192), ceph find that the page
does not be cached in fscache, then it decides to write this page into
fscache, during this process in cachefiles_write_page(), it found that object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
it make sense?

Cheers,
Li Wang

On 2013/12/27 6:51, Milosz Tanski wrote:
Li,

I looked at the patchset am I correct that this only happens when we
enable caching in the write path?

- Milosz

On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@xxxxxxxxxxxxxxx> wrote:
From: Yunchuan Wen <yunchuanwen@xxxxxxxxxxxxxxx>

The following scripts could easily panic the kernel,

#!/bin/bash
mount -t ceph -o fsc MONADDR:/ cephfs
rm -rf cephfs/foo
dd if=/dev/zero of=cephfs/foo bs=8 count=512
echo 3 > /proc/sys/vm/drop_caches
dd if=cephfs/foo of=/dev/null bs=8 count=1024

This is due to when writing a page into fscache, the code will
assert that the write position does not exceed the
object->store_limit_l, which is supposed to be equal to inode->i_size.
However, for current implementation, after file writing, the
object->store_limit_l is not synchronized with new
inode->i_size immediately, which introduces a race that if writing
a new page into fscache, will reach the ASSERT that write position
has exceeded the object->store_limit_l, and cause kernel panic.
This patch fixes it.

Yunchuan Wen (3):
Ceph fscache: Add an interface to synchronize object store limit
Ceph fscache: Update object store limit after writing
Ceph fscache: Wait for completion of object initialization

fs/ceph/cache.c | 1 +
fs/ceph/cache.h | 10 ++++++++++
fs/ceph/file.c | 3 +++
3 files changed, 14 insertions(+)

--
1.7.9.5




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/