Re: [PATCH v3 1/1] mm/madvise: replace ptrace attach requirement for process_madvise

From: David Hildenbrand
Date: Fri Mar 05 2021 - 12:54:14 EST


On 05.03.21 18:45, Shakeel Butt wrote:
On Fri, Mar 5, 2021 at 9:37 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 04.03.21 01:03, Shakeel Butt wrote:
On Wed, Mar 3, 2021 at 3:34 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:

On Wed, Mar 3, 2021 at 3:17 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:

On Wed, Mar 3, 2021 at 10:58 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:

process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process. It effectively removes the security boundary between the
two processes (in one direction). Granting ptrace attach capability
even to a system process is considered dangerous since it creates an
attack surface. This severely limits the usage of this API.
The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data
is physically located (and therefore, how fast it can be accessed).
What we want is the ability for one process to influence another process
in order to optimize performance across the entire system while leaving
the security boundary intact.
Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ
and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata
and CAP_SYS_NICE for influencing process performance.

Cc: stable@xxxxxxxxxxxxxxx # 5.10+
Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
Acked-by: Minchan Kim <minchan@xxxxxxxxxx>
Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
---
changes in v3
- Added Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
- Created man page for process_madvise per Andrew's request: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993
- cc'ed stable@xxxxxxxxxxxxxxx # 5.10+ per Andrew's request
- cc'ed linux-security-module@xxxxxxxxxxxxxxx per James Morris's request

mm/madvise.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index df692d2e35d4..01fef79ac761 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
goto release_task;
}

- mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
+ /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
+ mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
if (IS_ERR_OR_NULL(mm)) {
ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
goto release_task;
}

+ /*
+ * Require CAP_SYS_NICE for influencing process performance. Note that
+ * only non-destructive hints are currently supported.

How is non-destructive defined? Is MADV_DONTNEED non-destructive?

Non-destructive in this context means the data is not lost and can be
recovered. I follow the logic described in
https://lwn.net/Articles/794704/ where Minchan was introducing
MADV_COLD and MADV_PAGEOUT as non-destructive versions of MADV_FREE
and MADV_DONTNEED. Following that logic, MADV_FREE and MADV_DONTNEED
would be considered destructive hints.
Note that process_madvise_behavior_valid() allows only MADV_COLD and
MADV_PAGEOUT at the moment, which are both non-destructive.


There is a plan to support MADV_DONTNEED for this syscall. Do we need
to change these access checks again with that support?

Eh, I absolutely don't think letting another process discard memory in
another process' address space is a good idea. The target process can
observe that easily and might even run into real issues.

What's the use case?


Userspace oom reaper. Please look at
https://lore.kernel.org/linux-api/20201014183943.GA1489464@xxxxxxxxxx/T/


Thanks, somehow I missed that (not that it really changed my opinion on the approach while skimming over the discussion :) will have a more detailed look)

--
Thanks,

David / dhildenb