[BUGS] [CHECKER] 99 synchronization bugs and a lock summary database

From: Yichen Xie
Date: Thu Jul 01 2004 - 20:02:28 EST


Hi all,

We are a group of researchers at Stanford working on program analysis
algorithms. We have been building a precision enhanced program analysis
engine at Stanford, and our first application was to derive mutex/lock
behavior in the linux kernel. In the process, we found 99 likely
synchronization errors in linux kernel version 2.6.5:

http://glide.stanford.edu/linux-lock/err1.html (69 errors)
http://glide.stanford.edu/linux-lock/err2.html (30 errors)

err1.html consists of potential double locks/unlocks. Acquiring a lock
twice in a row may result in a system hang, and releasing a lock more than
once with certain mutex functions (e.g. up) may cause critical section
violations.

err2.html consists of reports on inconsistent output lock states on
function exit. These errors usually correspond to missed lock operations
on error paths. (filenames in this report correspond to where a function
is declared, so CTAGS may come in handy to find the actual implementation
of the function).

In the error reports, functions are hyperlinked to their derived
summaries, and those of their callees (since the analysis spans function
calls, the error condition of a particular function usually depend on the
locking behavior of its callees).

For example, in function "radeon_pm_program_v2clk" (first error report in
err1.html), the tool flagged an error at line 323 of
"drivers/video/aty/radeon_pm.c". Line 323 invokes two macros, OUTPLL, and
INPLL. OUTPLL acquires "rinfo->reg_lock", and then evaluates "addr", which
is calculated, in this case, by calling _INPLL. By clicking on the link
"drivers/video/aty/radeon_pm.c:radeon_pm_program_v2clk", we can see that
_INPLL requires "rinfo->reg_lock" be unheld on entry (confirmed by looking
at its definition), which is not satisfied in this example. So this is a
double lock error and could potentially lead to a deadlock on MP systems.

We also have a separate web interface to the summary database at:

http://glide.stanford.edu/linux-lock/

For example, typing "fh_put" in the input box gives

=========
SUMMARY
=========
FUNCTION SUMMARY: 'include/linux/nfsd/nfsfh.h:fh_put'
{
dcache_lock(global): [unlocked -> unlocked]
fhp(param#0)->fh_dentry->d_lock: [unlocked -> unlocked]
fhp(param#0)->fh_dentry->d_inode->i_sem: [locked -> unlocked]
}

Each line in the function summary correspond to the requirements and
effects on one particular lock. For example, fh_put requires that the
global lock variable dcache_lock be unheld on entry, and it'll remain
unheld on exit. It also requires fhp->fh_dentry->d_inode->i_sem be held on
entry and it'll release it on exit. (note: please ignore summaries for
lock premitives like spin_lock or down_interruptible; models for these
functions are built into the checker and the derived summaries are not
used).

We have found that some modules in the kernel has functions with
complicated synchronization behavior (esp. in filesystems), and we hope
summaries generated by this tool could be useful not only for bug finding,
but also for documentation purposes as well.

The analysis is intraprocedurally "path sensitive", so it won't be fooled
by cases like

if (flag & BLOCKING) spin_lock(&l);
...
if (flag & BLOCKING) spin_unlock(&l);

or

if (!spin_trylock(&l))
return -EAGAIN;
...
spin_unlock(&l);

The analysis algorithm models values (down to individual bits) and
pointers in the program with a boolean satisfiability solver with high
precision, and we're actively looking for other properties involving
(heavy) data dependencies where naive analysis would fail. Suggestions and
insights from the linux kernel community will be more than welcome!

As always, feedbacks and confirmations will be greatly appreciated!

Best regards,
Yichen Xie
<yxie@xxxxxxxxxxxxxxx>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/