Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework

From: Masami Hiramatsu
Date: Thu Dec 28 2017 - 02:51:51 EST


On Wed, 27 Dec 2017 19:49:28 -0800
Alexei Starovoitov <ast@xxxxxx> wrote:

> On 12/27/17 5:38 PM, Masami Hiramatsu wrote:
> > On Wed, 27 Dec 2017 14:49:46 -0800
> > Alexei Starovoitov <ast@xxxxxx> wrote:
> >
> >> On 12/27/17 12:09 AM, Masami Hiramatsu wrote:
> >>> On Tue, 26 Dec 2017 18:12:56 -0800
> >>> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> >>>
> >>>> On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:
> >>>>> Support in-kernel fault-injection framework via debugfs.
> >>>>> This allows you to inject a conditional error to specified
> >>>>> function using debugfs interfaces.
> >>>>>
> >>>>> Signed-off-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> >>>>> ---
> >>>>> Documentation/fault-injection/fault-injection.txt | 5 +
> >>>>> kernel/Makefile | 1
> >>>>> kernel/fail_function.c | 169 +++++++++++++++++++++
> >>>>> lib/Kconfig.debug | 10 +
> >>>>> 4 files changed, 185 insertions(+)
> >>>>> create mode 100644 kernel/fail_function.c
> >>>>>
> >>>>> diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
> >>>>> index 918972babcd8..6243a588dd71 100644
> >>>>> --- a/Documentation/fault-injection/fault-injection.txt
> >>>>> +++ b/Documentation/fault-injection/fault-injection.txt
> >>>>> @@ -30,6 +30,11 @@ o fail_mmc_request
> >>>>> injects MMC data errors on devices permitted by setting
> >>>>> debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
> >>>>>
> >>>>> +o fail_function
> >>>>> +
> >>>>> + injects error return on specific functions by setting debugfs entries
> >>>>> + under /sys/kernel/debug/fail_function. No boot option supported.
> >>>>
> >>>> I like it.
> >>>> Could you document it a bit better?
> >>>
> >>> Yes, I will do in next series.
> >>>
> >>>> In particular retval is configurable, but without an example no one
> >>>> will be able to figure out how to use it.
> >>>
> >>> Ah, right. BTW, as I pointed in the covermail, should we store the
> >>> expected error value range into the injectable list? e.g.
> >>>
> >>> ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)
> >>>
> >>> And provide APIs to check/get it.
> >>
> >> I'm afraid such check would be too costly.
> >> Right now we have only two functions marked but I expect hundreds more
> >> will be added in the near future as soon as developers realize the
> >> potential of such error injection.
> >> All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data.
> >> Multiple by 1k and we have 8k of data spent on marks.
> >> If we add max/min range marks that doubles it for very little use.
> >> I think marking function only is enough.
> >
> > Sorry, I don't think so.
> > Even if it takes 16 bytes more for each points, I don't think it is
> > any overhead for machines in these days. Even if so, we can provide
> > a kconfig to reduce it.
> > I mean, we are living in GB-order memory are, and it will be bigger
> > in the future. Why we have to worry about hundreds of 16bytes memory
> > pieces? It will take a few KB, and even if we mark thousands of
> > functions, it never reaches 1MB, in GB memory pool. :)
> >
> > Of course, for many small-footprint embedded devices (like having
> > less than 128MB memory), this feature can be a overhead. But they
> > can cut off the table by kconfig.
>
> I still disagree on wasting 16-byte * num_of_funcs of .data here.
> The trade-off of usability vs memory just not worth it. Sorry.
> Please focus on testing your changes instead.

Then what happen if the user set invalid retval to those functions?
even if we limit the injectable functions, it can cause a problem,

for example,

obj = func_return_object();
if (!obj) {
handling_error...;
}
obj->field = x;

In this case, obviously func_return_object() must return NULL if there is
an error, not -ENOMEM. But without the correct retval information, how would
you check the BPF code doesn't cause a trouble?
Currently it seems you are expecting only the functions which return error code.

ret = func_return_state();
if (ret < 0) {
handling_error...;
}

But how we can distinguish those?

If we have the error range for each function, we can ensure what is
*correct* error code, NULL or errno, or any other error numbers. :)

At least fail_function needs this feature because it can check
return value when setting it up.

Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>