Re: [RFC PATCH 1/2] pipe: introduce busy wait for pipe

From: Steven Sistare
Date: Wed Sep 05 2018 - 09:46:11 EST


On 9/4/2018 8:50 PM, Subhra Mazumdar wrote:
> On 08/31/2018 09:09 AM, Steven Sistare wrote:
>> On 8/30/2018 4:24 PM, subhra mazumdar wrote:
>>> Introduce pipe_ll_usec field for pipes that indicates the amount of micro
>>> seconds a thread should spin if pipe is empty or full before sleeping. This
>>> is similar to network sockets. Workloads like hackbench in pipe mode
>>> benefits significantly from this by avoiding the sleep and wakeup overhead.
>>> Other similar usecases can benefit. pipe_wait_flag is used to signal any
>>> thread busy waiting. pipe_busy_loop_timeout checks if spin time is over.
>>>
>>> Signed-off-by: subhra mazumdar <subhra.mazumdar@xxxxxxxxxx>
>>> ---
>>> Â include/linux/pipe_fs_i.h | 19 +++++++++++++++++++
>>> Â 1 file changed, 19 insertions(+)
>>>
>>> diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
>>> index e7497c9..fdfd2a2 100644
>>> --- a/include/linux/pipe_fs_i.h
>>> +++ b/include/linux/pipe_fs_i.h
>>> @@ -1,6 +1,8 @@
>>> Â #ifndef _LINUX_PIPE_FS_I_H
>>> Â #define _LINUX_PIPE_FS_I_H
>>> Â +#include <linux/sched/clock.h>
>>> +
>>> Â #define PIPE_DEF_BUFFERSÂÂÂ 16
>>> Â Â #define PIPE_BUF_FLAG_LRUÂÂÂ 0x01ÂÂÂ /* page is on the LRU */
>>> @@ -54,6 +56,8 @@ struct pipe_inode_info {
>>> ÂÂÂÂÂ unsigned int waiting_writers;
>>> ÂÂÂÂÂ unsigned int r_counter;
>>> ÂÂÂÂÂ unsigned int w_counter;
>>> +ÂÂÂ unsigned int pipe_ll_usec;
>>> +ÂÂÂ unsigned long pipe_wait_flag;
>>> ÂÂÂÂÂ struct page *tmp_page;
>>> ÂÂÂÂÂ struct fasync_struct *fasync_readers;
>>> ÂÂÂÂÂ struct fasync_struct *fasync_writers;
>>> @@ -157,6 +161,21 @@ static inline int pipe_buf_steal(struct pipe_inode_info *pipe,
>>> ÂÂÂÂÂ return buf->ops->steal(pipe, buf);
>>> Â }
>>> Â +static inline unsigned long pipe_busy_loop_current_time(void)
>>> +{
>>> +ÂÂÂ return (unsigned long)(local_clock() >> 10);
>> Why ">> 10" ? local_lock() has nanosec units, and you compare to the tunable
>> pipe_llc_sec which has microsec units. Should be ">> 3". Better yet, redefine
>> the tunable to have nanosec units. I suspect you will need very large values
>> of the tunable to show similar results.
> It's 2^10. I don't think using nanosec units is necessary. It is unlikely
> data will be read or written in nano seconds. sk_busy_loop_timeout for
> sockets uses micro seconds too.

Ah, you are using 2^10 as an approximation of 1000. OK.

- Steve

>>
>> Also, since this type of optimization consumes CPU extra cycles that could
>> be used by other tasks, show the overall CPU utilization before and after
>> the optimization, such as by using "time hackbench ...".
> OK.
>
> Thanks,
> Subhra
>>
>> - Steve
>>
>>> +}
>>> +
>>> +static inline bool pipe_busy_loop_timeout(struct pipe_inode_info *pipe,
>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned long start_time)
>>> +{
>>> +ÂÂÂ unsigned long bp_usec = READ_ONCE(pipe->pipe_ll_usec);
>>> +ÂÂÂ unsigned long end_time = start_time + bp_usec;
>>> +ÂÂÂ unsigned long now = pipe_busy_loop_current_time();
>>> +
>>> +ÂÂÂ return time_after(now, end_time);
>>> +}
>>> +
>>> Â /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
>>>  memory allocation, whereas PIPE_BUF makes atomicity guarantees. */
>>> Â #define PIPE_SIZEÂÂÂÂÂÂÂ PAGE_SIZE
>>>
>