Re: [PACTH v1] mm, proc: Implement /proc/<pid>/totmaps

From: Sonny Rao
Date: Wed Aug 10 2016 - 14:25:35 EST


On Tue, Aug 9, 2016 at 2:01 PM, Robert Foss <robert.foss@xxxxxxxxxxxxx> wrote:
>
>
> On 2016-08-09 03:24 PM, Jann Horn wrote:
>>
>> On Tue, Aug 09, 2016 at 12:05:43PM -0400, robert.foss@xxxxxxxxxxxxx wrote:
>>>
>>> From: Sonny Rao <sonnyrao@xxxxxxxxxxxx>
>>>
>>> This is based on earlier work by Thiago Goncales. It implements a new
>>> per process proc file which summarizes the contents of the smaps file
>>> but doesn't display any addresses. It gives more detailed information
>>> than statm like the PSS (proprotional set size). It differs from the
>>> original implementation in that it doesn't use the full blown set of
>>> seq operations, uses a different termination condition, and doesn't
>>> displayed "Locked" as that was broken on the original implemenation.
>>>
>>> This new proc file provides information faster than parsing the
>>> potentially
>>> huge smaps file.
>>>
>>> Signed-off-by: Sonny Rao <sonnyrao@xxxxxxxxxxxx>
>>>
>>> Tested-by: Robert Foss <robert.foss@xxxxxxxxxxxxx>
>>> Signed-off-by: Robert Foss <robert.foss@xxxxxxxxxxxxx>
>>
>>
>>
>>> +static int totmaps_proc_show(struct seq_file *m, void *data)
>>> +{
>>> + struct proc_maps_private *priv = m->private;
>>> + struct mm_struct *mm;
>>> + struct vm_area_struct *vma;
>>> + struct mem_size_stats *mss_sum = priv->mss;
>>> +
>>> + /* reference to priv->task already taken */
>>> + /* but need to get the mm here because */
>>> + /* task could be in the process of exiting */
>>
>>
>> Can you please elaborate on this? My understanding here is that you
>> intend for the caller to be able to repeatedly read the same totmaps
>> file with pread() and still see updated information after the target
>> process has called execve() and be able to detect process death
>> (instead of simply seeing stale values). Is that accurate?
>>
>> I would prefer it if you could grab a reference to the mm_struct
>> directly at open time.
>
>
> Sonny, do you know more about the above comment?

I think right now the file gets re-opened every time, but the mode
where the file is opened once and repeatedly read is interesting
because it avoids having to open the file again and again.

I guess you could end up with a wierd situation where you don't read
the entire contents of the file in open call to read() and you might
get inconsistent data across the different statistics?

>
>>
>>
>>> + mm = get_task_mm(priv->task);
>>> + if (!mm || IS_ERR(mm))
>>> + return -EINVAL;
>>
>>
>> get_task_mm() doesn't return error codes, and all other callers just
>> check whether the return value is NULL.
>>
>
> I'll have that fixed in v2, thanks for spotting it!
>
>
>>
>>> + down_read(&mm->mmap_sem);
>>> + hold_task_mempolicy(priv);
>>> +
>>> + for (vma = mm->mmap; vma != priv->tail_vma; vma = vma->vm_next) {
>>> + struct mem_size_stats mss;
>>> + struct mm_walk smaps_walk = {
>>> + .pmd_entry = smaps_pte_range,
>>> + .mm = vma->vm_mm,
>>> + .private = &mss,
>>> + };
>>> +
>>> + if (vma->vm_mm && !is_vm_hugetlb_page(vma)) {
>>> + memset(&mss, 0, sizeof(mss));
>>> + walk_page_vma(vma, &smaps_walk);
>>> + add_smaps_sum(&mss, mss_sum);
>>> + }
>>> + }
>>
>>
>> Errrr... what? You accumulate values from mem_size_stats items into a
>> struct mss_sum that is associated with the struct file? So when you
>> read the file the second time, you get the old values plus the new ones?
>> And when you read the file in parallel, you get inconsistent values?
>>
>> For most files in procfs, the behavior is that you can just call
>> pread(fd, buf, sizeof(buf), 0) on the same fd again and again, giving
>> you the current values every time, without mutating state. I strongly
>> recommend that you get rid of priv->mss and just accumulate the state
>> in a local variable (maybe one on the stack).
>
>
> So a simple "static struct mem_size_stats" in totmaps_proc_show() would be a
> better solution?
>
>>
>>
>>> @@ -836,6 +911,50 @@ static int tid_smaps_open(struct inode *inode,
>>> struct file *file)
>>> return do_maps_open(inode, file, &proc_tid_smaps_op);
>>> }
>>>
>>> +static int totmaps_open(struct inode *inode, struct file *file)
>>> +{
>>> + struct proc_maps_private *priv;
>>> + int ret = -ENOMEM;
>>> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>>> + if (priv) {
>>> + priv->mss = kzalloc(sizeof(*priv->mss), GFP_KERNEL);
>>> + if (!priv->mss)
>>> + return -ENOMEM;
>>
>>
>> Memory leak: If the first allocation works and the second one doesn't,
>> this
>> doesn't free the first allocation.
>>
>> Please change this to use the typical goto pattern for error handling.
>
>
> Fix will be implemented in v2.
>
>>
>>> +
>>> + /* we need to grab references to the task_struct */
>>> + /* at open time, because there's a potential information
>>> */
>>> + /* leak where the totmaps file is opened and held open */
>>> + /* while the underlying pid to task mapping changes */
>>> + /* underneath it */
>>
>>
>> Nit: That's not how comments are done in the kernel. Maybe change this to
>> a normal block comment instead of one block comment per line?
>
>
> I'm not sure how that one slipped by, but I'll change it in v2.
>
>>
>>> + priv->task = get_pid_task(proc_pid(inode), PIDTYPE_PID);
>>
>>
>> `get_pid_task(proc_pid(inode), PIDTYPE_PID)` is exactly the definition
>> of get_proc_task(inode), maybe use that instead?
>>
>
> Will do. v2 will fix this.
>
>>> + if (!priv->task) {
>>> + kfree(priv->mss);
>>> + kfree(priv);
>>> + return -ESRCH;
>>> + }
>>> +
>>> + ret = single_open(file, totmaps_proc_show, priv);
>>> + if (ret) {
>>> + put_task_struct(priv->task);
>>> + kfree(priv->mss);
>>> + kfree(priv);
>>> + }
>>> + }
>>> + return ret;
>>> +}
>>
>>
>> Please change this method to use the typical goto pattern for error
>> handling. IMO repeating the undo steps in all error cases makes
>> mistakes (like the one above) more likely and increases the amount
>> of redundant code.
>
>
> Agreed. Change queued for v2.
>
>>
>> Also: The smaps file is only accessible to callers with
>> PTRACE_MODE_READ privileges on the target task. Your thing doesn't
>> do any access checks, neither in the open handler nor in the read
>> handler. Can you give an analysis of why it's okay to expose this
>> data? As far as I can tell, without spending a lot of time thinking
>> about it, this kind of data looks like it might potentially be
>> useful for side-channel information leaks or so.
>>
>
> I think it should require the same permissions as smaps, so changing the
> code to require PTRACE_MODE_READ privileges is most likely a good idea. I'll
> have a look at it for v2.