Re: [PATCH 12/24] getgrent_r.3: Use sizeof() to get buffer size (instead of hardcoding macro name)

From: Alejandro Colomar
Date: Thu Sep 24 2020 - 09:09:57 EST


Hi Michael,

On 2020-09-24 13:38, Michael Kerrisk (man-pages) wrote:
> On 9/24/20 11:35 AM, Alejandro Colomar wrote:
>> Hi,
>>
>> On 2020-09-23 22:35, Michael Kerrisk (man-pages) wrote:
>>> On 9/15/20 12:03 PM, Stefan Puiu wrote:
>>>> Hi,
>>>>
>>>> On Fri, Sep 11, 2020 at 6:28 PM Alejandro Colomar
>>>> <colomar.6.4.3@xxxxxxxxx> wrote:
>>>>>
>>>>> Hi Stefan,
>>>>>
>>>>> On 2020-09-11 16:35, Stefan Puiu wrote:
>>>>> > Hi,
>>>>> >
>>>>> > On Fri, Sep 11, 2020 at 12:15 AM Alejandro Colomar
>>>>> > <colomar.6.4.3@xxxxxxxxx> wrote:
>>>>> >>
>>>>> >> Signed-off-by: Alejandro Colomar <colomar.6.4.3@xxxxxxxxx>
>>>>> >> ---
>>>>> >> man3/getgrent_r.3 | 2 +-
>>>>> >> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> >>
>>>>> >> diff --git a/man3/getgrent_r.3 b/man3/getgrent_r.3
>>>>> >> index 81d81a851..76deec370 100644
>>>>> >> --- a/man3/getgrent_r.3
>>>>> >> +++ b/man3/getgrent_r.3
>>>>> >> @@ -186,7 +186,7 @@ main(void)
>>>>> >>
>>>>> >> setgrent();
>>>>> >> while (1) {
>>>>> >> - i = getgrent_r(&grp, buf, BUFLEN, &grpp);
>>>>> >> + i = getgrent_r(&grp, buf, sizeof(buf), &grpp);
>>>>> >
>>>>> > I'm worried that less attentive people might copy/paste parts of this
>>>>> > in their code, where maybe buf is just a pointer, and expect it to
>>>>> > work. Maybe leaving BUFLEN here is useful as a reminder that they need
>>>>> > to change something to adapt the code?
>>>>> >
>>>>> > Just my 2 cents,
>>>>> > Stefan.
>>>>> >
>>>>> That's a very good point.
>>>>>
>>>>> So we have 3 options and I will propose now a 4th one. Let's see all
>>>>> of them and see which one is better for the man pages.
>>>>>
>>>>> 1.- Use the macro everywhere.
>>>>>
>>>>> pros:
>>>>> - It is still valid when the buffer is a pointer and not an array.
>>>>> cons:
>>>>> - Hardcodes the initializer. If the array is later initialized with a
>>>>> different value, it may produce a silent bug, or a compilation break.
>>>>>
>>>>> 2.- Use sizeof() everywhere, and the macro for the initializer.
>>>>>
>>>>> pros:
>>>>> - It is valid as long as the buffer is an array.
>>>>> cons:
>>>>> - If the code gets into a function, and the buffer is then a pointer,
>>>>> it will definitively produce a silent bug.
>>>>>
>>>>> 3.- Use sizeof() everywhere, and a magic number for the initializer.
>>>>>
>>>>> The same as 2.
>>>>>
>>>>> 4.- Use ARRAY_BYTES() macro
>>>>>
>>>>> pros:
>>>>> - It is always safe and when code changes, it may break compilation, but
>>>>> never a silent bug.
>>>>> cons:
>>>>> - Add a few lines of code. Maybe too much complexity for an example.
>>>>> But I'd say that it is the only safe option, and in real code it
>>>>> should probably be used more, so maybe it's good to show a good practice.
>>>>
>>>> If you ask me, I think examples should be simple and easy to
>>>> understand, and easy to copy/paste in your code. I'd settle for easy
>>>> enough, not perfect or completely foolproof. If you need to look up
>>>> obscure gcc features to understand an example, that's not very
>>>> helpful. So I'd be more inclined to prefer version 1 above. But let's
>>>> see Michael's opinion on this.
>>>>
>>>> Just my 2c,
>>>
>>> So, the fundamental problem is that C is nearly 50 years old.
>>> It's a great high-level assembly language, but when it comes
>>> to nuances like this it gets pretty painful. One can do macro
>>> magic of the kind you suggest, but I agree with Stefan that it
>>> gets confusing and distracting for the reader. I think I also
>>> lean to solution 1. Yes, it's not perfect, but it's easy to
>>> understand, and I don't think we can or should try and solve
>>> the broken-ness of C in the manual pages.
>>>
>>> Thanks,
>>>
>>> Michael
>>>
>>>
>>
>> I was reverting the 3 patches I introduced (they changed from solution 1
>> to solution 2), and also was grepping for already existing solution 2 in
>> the pages (it seems that solution 2 was a bit more extended than
>> solution 1).
>>
>> While doing that, I've been thinking about it again...
>>
>> There's a good thing about sizeof (even though I admit it's very
>> insecure; and I never use it for myself), especially for the man pages:
>>
>> I'll copy here a sample from getnameinfo.3 to ilustrate it:
>>
>> [[
>> .EX
>> struct sockaddr *addr; /* input */
>> socklen_t addrlen; /* input */
>> char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];
>>
>> if (getnameinfo(addr, addrlen, hbuf, sizeof(hbuf), sbuf,
>> sizeof(sbuf), NI_NUMERICHOST | NI_NUMERICSERV) == 0)
>> printf("host=%s, serv=%s\en", hbuf, sbuf);
>> .EE
>> ]]
>>
>> Here, it's clear to the reader that the 4th argument to 'getnameinfo()'
>> is the size of the buffer passed as the 3rd argument.
>>
>> If the function call was changed to
>>
>> [[
>> getnameinfo(addr, addrlen, hbuf, NI_MAXHOST, sbuf,
>> sizeof(sbuf), NI_NUMERICHOST | NI_NUMERICSERV)
>> ]]
>>
>> then it would be less clear, and the reader should go back and forth to
>> see where that comes from. In this short example it is relatively very
>> clear, but in some examples it might be less clear.
>>
>> Would you maintain your preference for solution 1?
>>
>>
>> Also... I am trying to patch glibc to provide a safe version of
>> 'nitems()', and shortly after they accept that patch (if they do), I'll
>> send another one to add a safe 'array_bytes()' based on 'nitems()'.
>>
>> Maybe the examples could use 'array_bytes()'; although is will be a
>> glibc extension, and non-existent in any other POSIX systems, of course,
>> which would make the examples non-portable, but still can be solved with
>> a simple
>>
>> [[
>> #if !defined(array_bytes)
>> #define array_bytes() sizeof()
>> #endif
>> ]]
>>
>> But again it complicates the examples...
>
> (And I'd rather not...)
>
>> I'm not sure at all about what should be done. Please comment. If you
>> still prefer solution 1, I'll send you a patch with the revert + fixes,
>> but I think it's very delicate.
>
> Okay -- I agree with your perspective of the getnameinfo example.
>
> So, I think maybe solution 1 is clearer sometimes, and other times
> solution 2 is clearer. I don't feel too strongly about this

Me neither.

> (because we can't solve the bugger problem, which is C).
> I'm fine with not reverting the three patches you
> refer to. I'm going to leave this decision to you :-)!
>
> Thanks,
>
> Michael
>
>

I'll keep it as is now. :)

Thanks,

Alex