Re: [PATCH] KVM: selftests: Run dirty_log_perf_test on specific cpus

From: Sean Christopherson
Date: Wed Aug 17 2022 - 17:29:36 EST


On Wed, Aug 17, 2022, Vipin Sharma wrote:
> On Wed, Aug 17, 2022 at 10:25 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > > +static int parse_num(const char *num_str)
> > > +{
> > > + int num;
> > > + char *end_ptr;
> > > +
> > > + errno = 0;
> > > + num = (int)strtol(num_str, &end_ptr, 10);
> > > + TEST_ASSERT(num_str != end_ptr && *end_ptr == '\0',
> > > + "Invalid number string.\n");
> > > + TEST_ASSERT(errno == 0, "Conversion error: %d\n", errno);
> >
> > Is the paranoia truly necessary? What happens if parse_cpu_list() simply uses
> > atoi() and is passed garbage?
>
> On error atoi() returns 0, which is also a valid logical cpu number.

Lame.

> We need error checking here to make sure that the user really wants
> cpu 0 and it was not a mistake in typing.
> I was thinking of using parse_num API for other places as well instead
> of atoi() in dirty_log_perf_test.

Yes, definitely. And maybe give it a name like atoi_paranoid()?

> Yeah, it was either my almost duplicate functions or have the one
> function do two things via if-else. I am not happy with both
> approaches.
>
> I think I will pass an integer array which this parsing function will
> fill up and return an int denoting how many elements were filled. The
> caller then can use the array as they wish, to copy it in
> vcpu_to_lcpu_map or cpuset.

Eh, I doubt that'll be a net improvement, e.g. the CPUSET case will then need to
re-loop, which seems silly. If the exclusive cpuset vs. array is undesirable, we
could have the API require at least one instead of exactly one, i.e.

TEST_ASSERT(cpuset || vcpu_map);

...

cpu = atoi(cpustr);
TEST_ASSERT(cpu >= 0, "Invalid cpu number: %d\n", cpu);
if (vcpu_map)
vcpu_map[i++] = cpu;
if (cpuset)
CPU_SET(cpu, cpuset);

If we somehow end up with a third type of destination, then we can revisit this,
but that seems unlikely at this point.

> > > @@ -383,6 +450,26 @@ static void help(char *name)
> > > backing_src_help("-s");
> > > printf(" -x: Split the memory region into this number of memslots.\n"
> > > " (default: 1)\n");
> > > + printf(" -c: Comma separated values of the logical CPUs which will run\n"
> > > + " the vCPUs. Number of values should be equal to the number\n"
> > > + " of vCPUs.\n\n"
> > > + " Example: ./dirty_log_perf_test -v 3 -c 22,43,1\n"
> > > + " This means that the vcpu 0 will run on the logical cpu 22,\n"
> > > + " vcpu 1 on the logical cpu 43 and vcpu 2 on the logical cpu 1.\n"
> > > + " (default: No cpu mapping)\n\n");
> > > + printf(" -d: Comma separated values of the logical CPUs on which\n"
> > > + " dirty_log_perf_test will run. Without -c option, all of\n"
> > > + " the vcpus and main process will run on the cpus provided here.\n"
> > > + " This option also accepts a single cpu. (default: No cpu mapping)\n\n"
> > > + " Example 1: ./dirty_log_perf_test -v 3 -c 22,43,1 -d 101\n"
> > > + " Main application thread will run on logical cpu 101 and\n"
> > > + " vcpus will run on the logical cpus 22, 43 and 1\n\n"
> > > + " Example 2: ./dirty_log_perf_test -v 3 -d 101\n"
> > > + " Main application thread and vcpus will run on the logical\n"
> > > + " cpu 101\n\n"
> > > + " Example 3: ./dirty_log_perf_test -v 3 -d 101,23,53\n"
> > > + " Main application thread and vcpus will run on logical cpus\n"
> > > + " 101, 23 and 53.\n");
> > > puts("");
> > > exit(0);
> > > }
> >
> > > @@ -455,6 +550,13 @@ int main(int argc, char *argv[])
> > > }
> > > }
> > >
> >
> > I wonder if we should make -c and -d mutually exclusive. Tweak -c to include the
> > application thread, i.e. TEST_ASSERT(nr_lcpus == nr_vcpus+1) and require 1:1 pinning
> > for all tasks. E.g. allowing "-c ..., -d 0,1,22" seems unnecessary.
> >
>
> One downside I can think of will be if we add some worker threads
> which are not vcpus then all of those threads will end up running on a
> single cpu unless we edit this parsing logic again.

But adding worker threads also requires a code change, i.e. it won't require a
separate commit/churn. And if we get to the point where we want multiple workers,
it should be relatively straightforward to support pinning an arbitrary number of
workers, e.g.

enum memtest_worker_type {
MAIN_WORKER,
MINION_1,
MINION_2,
NR_MEMTEST_WORKERS,
}


TEST_ASSERT(nr_lcpus == nr_vcpus + NR_MEMTEST_WORKERS);

void spawn_worker(enum memtest_worker_type type, <function pointer>)
{
cpu_set_t cpuset;

CPU_ZERO(&cpuset);
CPU_SET(task_map[nr_vcpus + type], &cpuset);

<set affinity and spawn>
}

> Current implementation gives vcpus special treatment via -c and for
> the whole application via -d. This gives good separation of concerns
> via flags.

But they aren't separated, e.g. using -d without -c means vCPUs are thrown into
the same pool as worker threads. And if we ever do add more workers, -d doesn't
allow the user to pin workers 1:1 with logical CPUs.

Actually, if -c is extended to pin workers, then adding -d is unnecessary. If the
user wants to affine tasks to CPUs but not pin 1:1, it can do that with e.g. taskset.
What the user can't do is pin 1:1.

If we don't want to _require_ the caller to pin the main worker, then we could do

TEST_ASSERT(nr_lcpus >= nr_vcpus &&
nr_lcpus <= nr_vcpus + NR_MEMTEST_WORKERS);

to _require_ pinning all vCPUs, and allow but not require pinning non-vCPU tasks.