Re: [PATCH] vfs: Avoid IPI storm due to bh LRU invalidation

From: Gilad Ben-Yossef
Date: Tue Feb 07 2012 - 11:25:17 EST

On Tue, Feb 7, 2012 at 12:25 AM, Jan Kara <jack@xxxxxxx> wrote:
> On Mon 06-02-12 13:17:17, Andrew Morton wrote:
>> On Mon, 6 Feb 2012 17:47:32 +0100
>> Jan Kara <jack@xxxxxxx> wrote:
>> > On Mon 06-02-12 21:12:36, Srivatsa S. Bhat wrote:
>> > > On 02/06/2012 07:25 PM, Jan Kara wrote:
>> > >
>> > > > When discovery of lots of disks happen in parallel, we call
>> > > > invalidate_bh_lrus() once for each disk from partitioning code resulting in a
>> > > > storm of IPIs and causing a softlockup detection to fire (it takes several
>> > > > *minutes* for a machine to execute all the invalidate_bh_lrus() calls).
>> Gad.  How many disks are we talking about here?
>  I think something around hundred scsi disks in this case (number of
> physical drives is actually lower but multipathing blows it up). I actually
> saw machines with close to thousand scsi disks (yes, they had names like
> sdabc ;).

LOL. Is that a huge SCSI disk array in your server or your are just
happy to see me... ? :-)
>> > >
>> > > Something related that you might be interested in:
>> > >
>> > >
>> > > (This is part of Gilad's patchset that tries to reduce cross-CPU IPI
>> > > interference.)
>> >   Thanks for the pointer. I didn't know about it. As Hannes wrote, this
>> > need not be enough for our use case as there might indeed be some bhs in
>> > the LRU. But I'd be interested how well the patchset works anyway. Maybe it
>> > would be enough because after all when we invalidate LRUs subsequent
>> > callers will see them empty and not issue IPI? Hannes, can you give a try
>> > to the patches?

I think its worth a shot since the mutex just delays the IPIs instead
of canceling them

A somewhat similar issue in the direct reclaim path of the buddy
allocator trying
to reclaim per cpu pages was causing a massive storm of IPIs during OOM with
concurrent work loads and the IPI noise patches mitigate 85% of the
IPIs sent just by checking to see if there are any per cpu pages on the CPU you
are about to IPI, so maybe the same kind of logic applies here as well.


Gilad Ben-Yossef
Chief Coffee Drinker
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at