Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines)

From: Ingo Molnar
Date: Tue Mar 09 2004 - 04:05:12 EST

* Ingo Molnar <mingo@xxxxxxx> wrote:

> ugh? This is not my main argument against 'objrmap'. My main (and
> pretty much only) argument against it is the linear searching it
> reintroduces.

to clarify this somewhat. objrmap works fine (and roughly equivalently
to rmap) when processes map files via one vma mostly. (e.g. shared

objrmap falls apart badly if there are tons but _disjunct_ vmas to the
same inode. One such workload is Oracle's 'indirect buffer cache'. It is
a ~512 MB virtual memory window with 32KB mapsize, mapping to a much
larger shmfs page, featuring 16 thousand vmas per process.

The problem is that the ->i_mmap and ->i_mmap_shared lists 'merge' _all_
the vmas that somehow belong to the given inode. Ie. in the above case
it has a length of 16 thousand entries. So while possibly none of those
vmas shares any physical page with each other, your
try_to_unmap_obj(page) function will loop through possibly thousands of
vmas, killing the VM's ability to reclaim. (and also presenting the
'kswapd is stuck and eating up CPU time' phenomenon to users.)

Andrea, did you know about this property of your patch? If yes, why
didnt you mention it in the announcement, as a tradeoff to take care of?

it's ironic that precisely the workload you cite (shmfs for IO cache,
when the shared memory size is larger than what the process can map) is
the one that would hurt most from objrmap. In that workload there can be
possibly tens of thousands of disjunct vmas mapping the same shmfs inode
and kswapd would loop endlessly without achieving anything useful.

(remap_file_pages() will handle such workloads fine, but it's still a
big regression for those applications that happen to have more than a
handful of disjunct vmas per inode. I obviously like fremap, but i dont
want to force it down anyone's throat.)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at