Re: [RFC] Bypass filesystems for reading cached pages

From: Chaitanya Kulkarni
Date: Fri Jun 19 2020 - 17:26:09 EST


Matthew,

On 6/19/20 1:12 PM, Matthew Wilcox wrote:
> On Fri, Jun 19, 2020 at 07:06:19PM +0000, Chaitanya Kulkarni wrote:
>> On 6/19/20 8:50 AM, Matthew Wilcox wrote:
>>> This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS.
>>> The advantage of this patch is that we can avoid taking any filesystem
>>> lock, as long as the pages being accessed are in the cache (and we don't
>>> need to readahead any pages into the cache). We also avoid an indirect
>>> function call in these cases.
>>
>> I did a testing with NVMeOF target file backend with buffered I/O
>> enabled with your patch and setting the IOCB_CACHED for each I/O ored
>> '|' with IOCB_NOWAIT calling call_read_iter_cached() [1].
>>
>> The name was changed from call_read_iter() -> call_read_iter_cached() [2]).
>>
>> For the file system I've used XFS and device was null_blk with memory
>> backed so entire file was cached into the DRAM.
>
> Thanks for testing! Can you elaborate a little more on what the test does?
> Are there many threads or tasks? What is the I/O path? XFS on an NVMEoF
> device, talking over loopback to localhost with nullblk as the server?
>
> The nullblk device will have all the data in its pagecache, but each XFS
> file will have an empty pagecache initially. Then it'll be populated by
> the test, so does the I/O pattern revisit previously accessed data at all?
>

Will share details shortly.

>> Following are the performance numbers :-
>>
>> IOPS/Bandwidth :-
>>
>> default-page-cache: read: IOPS=1389k, BW=5424MiB/s (5688MB/s)
>> default-page-cache: read: IOPS=1381k, BW=5395MiB/s (5657MB/s)
>> default-page-cache: read: IOPS=1391k, BW=5432MiB/s (5696MB/s)
>> iocb-cached-page-cache: read: IOPS=1403k, BW=5481MiB/s (5747MB/s)
>> iocb-cached-page-cache: read: IOPS=1393k, BW=5439MiB/s (5704MB/s)
>> iocb-cached-page-cache: read: IOPS=1399k, BW=5465MiB/s (5731MB/s)
>
> That doesn't look bad at all ... about 0.7% increase in IOPS.
>
>> Submission lat :-
>>
>> default-page-cache: slat (usec): min=2, max=1076, avg= 3.71,
>> default-page-cache: slat (usec): min=2, max=489, avg= 3.72,
>> default-page-cache: slat (usec): min=2, max=1078, avg= 3.70,
>> iocb-cached-page-cache: slat (usec): min=2, max=1731, avg= 3.70,
>> iocb-cached-page-cache: slat (usec): min=2, max=2115, avg= 3.69,
>> iocb-cached-page-cache: slat (usec): min=2, max=3055, avg= 3.70,
>
> Average latency unchanged, max latency up a little ... makes sense,
> since we'll do a little more work in the worst case.
>
>> @@ -264,7 +267,8 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
>>
>> if (req->ns->buffered_io) {
>> if (likely(!req->f.mpool_alloc) &&
>> - nvmet_file_execute_io(req, IOCB_NOWAIT))
>> + nvmet_file_execute_io(req,
>> + IOCB_NOWAIT |IOCB_CACHED))
>> return;
>> nvmet_file_submit_buffered_io(req);
>
> You'll need a fallback path here, right? IOCB_CACHED can get part-way
> through doing a request, and then need to be finished off after taking
> the mutex.
>
>

That is what something needs to be done in the call_read_iter cached
otherwise with a flag otherwise callers will have to duplicate the code
all over the kernel. Correct me if I'm wrong.

For now I'm populating all the data in the cache for fio runs so 2nd and
3rd run is coming from the cache.


Also, looking at the code now I don't think we need to use IOCB_CACHED
flag as long as we call call_read_iter_cached() since it takes care of
adding this flags which is a correct way of abstracting the code, than
caller duplicate this flag all over the kernel.

Am I missing something ?

I'll run test again and share more details.