[PATCHSET 2.6.22-rc4] sysfs: fix race conditions

From: Tejun Heo
Date: Mon Jun 11 2007 - 01:01:24 EST


Hello, all.

Currently, there are several race conditions around dentry/inode
reclamation.

a. sysfs_dirent->s_dentry dereferencing in sysfs_readdir()

b. sysfs_dirent->s_dentry dereferencing in sysfs_drop_dentry()

c. sysfs_dirent->s_dentry clearing in sysfs_d_iput()

All aboves are done without synchronization and can cause oops if the
timing is right (or wrong).

These race conditions are difficult to trigger but with the attached
patch (sysfs-races.patch) and the following commands running
parallelly, all three are reliably reproducible (you may have to
change timings or disable others to trigger specific one).

1. while true; do insmod drivers/ata/libata.ko; insmod drivers/ata/ata_piix.ko; sleep 1; rmmod ata_piix; rmmod libata; sleep 1; echo -n . ; done
2. while true; do find /sys -type f | xargs cat > /dev/null; echo -n .; sleep 1; done
3. while true; do find /sys/class/scsi_disk -type f | sort | xargs cat > /dev/null; echo -n .; sleep 1; done
4. while true; do umount /sys; sleep 1; mount /sys; sleep 1; echo -n .; done

#1 assumes there are several devices attached to ata_piix controller.
Change #1 to any module or command which creates and removes sysfs
nodes repeatedly and adjust #3 to cat those sysfs nodes.

All known race conditions are fixed in the current -mm. #a is
replaced by adding sd->s_ino and allocating unique ino with ida. #b
and #c are fixed during sysfs_drop_dentry() rewrite. However, those
changes were too big to apply to 2.6.22-rcX or any stable branches.

This patchset contains three minimal backports of fixes in -mm. With
all patches in the patchset and sysfs-races.patch applied, kernel
survived ~20 hours of stress test without any problem.

Thanks.

--
tejun
---
fs/sysfs/dir.c | 7 +++++--
fs/sysfs/inode.c | 2 ++
2 files changed, 7 insertions(+), 2 deletions(-)

Index: work1/fs/sysfs/dir.c
===================================================================
--- work1.orig/fs/sysfs/dir.c
+++ work1/fs/sysfs/dir.c
@@ -18,6 +18,8 @@ static void sysfs_d_iput(struct dentry *
{
struct sysfs_dirent * sd = dentry->d_fsdata;

+ udelay(10);
+
if (sd) {
BUG_ON(sd->s_dentry != dentry);
sd->s_dentry = NULL;
@@ -538,9 +540,10 @@ static int sysfs_readdir(struct file * f

name = sysfs_get_name(next);
len = strlen(name);
- if (next->s_dentry)
+ if (next->s_dentry) {
+ msleep(1);
ino = next->s_dentry->d_inode->i_ino;
- else
+ } else
ino = iunique(sysfs_sb, 2);

if (filldir(dirent, name, len, filp->f_pos, ino,
Index: work1/fs/sysfs/inode.c
===================================================================
--- work1.orig/fs/sysfs/inode.c
+++ work1/fs/sysfs/inode.c
@@ -248,6 +248,8 @@ void sysfs_drop_dentry(struct sysfs_dire
struct dentry * dentry = sd->s_dentry;
struct inode *inode;

+ msleep(1);
+
if (dentry) {
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);