Re: [PATCH v3 1/3] mm: introduce fincore()

From: Dave Hansen
Date: Mon Jul 07 2014 - 18:44:35 EST


On 07/07/2014 02:48 PM, Naoya Horiguchi wrote:
> On Mon, Jul 07, 2014 at 01:43:31PM -0700, Dave Hansen wrote:
>> The whole FINCORE_PGOFF vs. FINCORE_BMAP issue is something that will
>> come up in practice. We just don't have the interfaces for an end user
>> to pick which one they want to use.
>>
>>>> Is it really right to say this is going to be 8 bytes? Would we want it
>>>> to share types with something else, like be an loff_t?
>>>
>>> Could you elaborate it more?
>>
>> We specify file offsets in other system calls, like the lseek family. I
>> was just thinking that this type should match up with those calls since
>> they are expressing the same data type with the same ranges and limitations.
>
> The 2nd parameter is loff_t, do we already do this?

I mean the fields in the buffer, like:

> +Any of the following flags are to be set to add an 8 byte field in each entry.
> +You can set any of these flags at the same time, although you can't set
> +FINCORE_BMAP combined with these 8 byte field flags.


>>>> This would essentially tell userspace where in the kernel's address
>>>> space some user-controlled data will be.
>>>
>>> OK, so this and FINCORE_PAGEFLAGS will be limited for privileged users.
>
> Sorry, this statement of mine might a bit short-sighted, and I'd like
> to revoke it.
> I think that some page flags and/or numa info should be useful outside
> the debugging environment, and safe to expose to userspace. So limiting
> to bitmap-one for unprivileged users is too strict.

The PFN is not the same as NUMA information, and the PFN is insufficient
to describe the NUMA node on all systems that Linux supports.

Trying to get NUMA information back out is a good goal, but doing it
with PFNs is a bad idea since they have so many consequences.

I'm also bummed exporting NUMA information was a design goal of these
patches, but they weren't mentioned in any of the patch descriptions.

>> Then I'd just question their usefulness outside of a debugging
>> environment, especially when you can get at them in other (more
>> roundabout) ways in a debugging environment.
>>
>> This is really looking to me like two system calls. The bitmap-based
>> one, and another more extensible one. I don't think there's any harm in
>> having two system calls, especially when they're trying to glue together
>> two disparate interfaces.
>
> I think that if separating syscall into two, one for privileged users
> and one for unprivileged users migth be fine (rather than bitmap-based
> one and extensible one.)

The problem as I see it is shoehorning two interfaces in to the same
syscall. If there are privileged and unprivileged operations that use
the same _interfaces_ I think they should share a syscall.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/