Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache

From: Matthew Wilcox
Date: Thu Apr 08 2021 - 02:14:26 EST


On Wed, Apr 07, 2021 at 02:46:11PM -0700, Daniel Xu wrote:
> +struct bpf_iter_seq_pagecache_info {
> + struct mnt_namespace *ns;
> + struct radix_tree_root superblocks;

Why are you adding a new radix tree? Use an XArray instead.

> +static struct page *goto_next_page(struct bpf_iter_seq_pagecache_info *info)
> +{
> + struct page *page, *ret = NULL;
> + unsigned long idx;
> +
> + rcu_read_lock();
> +retry:
> + BUG_ON(!info->cur_inode);
> + ret = NULL;
> + xa_for_each_start(&info->cur_inode->i_data.i_pages, idx, page,
> + info->cur_page_idx) {
> + if (!page_cache_get_speculative(page))
> + continue;

Why do you feel the need to poke around in i_pages directly? Is there
something wrong with find_get_entries()?

> +static int __pagecache_seq_show(struct seq_file *seq, struct page *page,
> + bool in_stop)
> +{
> + struct bpf_iter_meta meta;
> + struct bpf_iter__pagecache ctx;
> + struct bpf_prog *prog;
> +
> + meta.seq = seq;
> + prog = bpf_iter_get_info(&meta, in_stop);
> + if (!prog)
> + return 0;
> +
> + meta.seq = seq;
> + ctx.meta = &meta;
> + ctx.page = page;
> + return bpf_iter_run_prog(prog, &ctx);

I'm not really keen on the idea of random BPF programs being able to poke
at pages in the page cache like this. From your initial description,
it sounded like all you needed was a list of which pages are present.

> + INIT_RADIX_TREE(&info->superblocks, GFP_KERNEL);
> +
> + spin_lock(&info->ns->ns_lock);
> + list_for_each_entry(mnt, &info->ns->list, mnt_list) {
> + sb = mnt->mnt.mnt_sb;
> +
> + /* The same mount may be mounted in multiple places */
> + if (radix_tree_lookup(&info->superblocks, (unsigned long)sb))
> + continue;
> +
> + err = radix_tree_insert(&info->superblocks,
> + (unsigned long)sb, (void *)1);
> + if (err)
> + goto out;
> + }
> +
> + radix_tree_for_each_slot(slot, &info->superblocks, &iter, 0) {
> + sb = (struct super_block *)iter.index;
> + atomic_inc(&sb->s_active);
> + }

Uh. What on earth made you think this was a good way to use the radix
tree? And, no, the XArray doesn't change that.

If you don't understand why this is so bad, call xa_dump() on it after
constructing it. I'll wait.