Re: [PATCH v10 0/5] Introduce mseal

From: Suren Baghdasaryan
Date: Fri Apr 19 2024 - 12:54:38 EST


On Fri, Apr 19, 2024 at 3:15 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
>
> On Fri, Apr 19, 2024 at 7:57 AM Suren Baghdasaryan <surenb@googlecom> wrote:
> >
> > On Thu, Apr 18, 2024 at 6:22 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Apr 18, 2024 at 1:19 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Apr 16, 2024 at 12:40 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Apr 16, 2024 at 8:13 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > * jeffxu@xxxxxxxxxxxx <jeffxu@xxxxxxxxxxxx> [240415 12:35]:
> > > > > > > From: Jeff Xu <jeffxu@xxxxxxxxxxxx>
> > > > > > >
> > > > > > > This is V10 version, it rebases v9 patch to 6.9.rc3.
> > > > > > > We also applied and tested mseal() in chrome and chromebook.
> > > > > > >
> > > > > > > ------------------------------------------------------------------
> > > > > > ...
> > > > > >
> > > > > > > MM perf benchmarks
> > > > > > > ==================
> > > > > > > This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
> > > > > > > check the VMAs’ sealing flag, so that no partial update can be made,
> > > > > > > when any segment within the given memory range is sealed.
> > > > > > >
> > > > > > > To measure the performance impact of this loop, two tests are developed.
> > > > > > > [8]
> > > > > > >
> > > > > > > The first is measuring the time taken for a particular system call,
> > > > > > > by using clock_gettime(CLOCK_MONOTONIC). The second is using
> > > > > > > PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
> > > > > > > similar results.
> > > > > > >
> > > > > > > The tests have roughly below sequence:
> > > > > > > for (i = 0; i < 1000, i++)
> > > > > > > create 1000 mappings (1 page per VMA)
> > > > > > > start the sampling
> > > > > > > for (j = 0; j < 1000, j++)
> > > > > > > mprotect one mapping
> > > > > > > stop and save the sample
> > > > > > > delete 1000 mappings
> > > > > > > calculates all samples.
> > > > > >
> > > > > >
> > > > > > Thank you for doing this performance testing.
> > > > > >
> > > > > > >
> > > > > > > Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
> > > > > > > 4G memory, Chromebook.
> > > > > > >
> > > > > > > Based on the latest upstream code:
> > > > > > > The first test (measuring time)
> > > > > > > syscall__ vmas t t_mseal delta_ns per_vma %
> > > > > > > munmap__ 1 909 944 35 35 104%
> > > > > > > munmap__ 2 1398 1502 104 52 107%
> > > > > > > munmap__ 4 2444 2594 149 37 106%
> > > > > > > munmap__ 8 4029 4323 293 37 107%
> > > > > > > munmap__ 16 6647 6935 288 18 104%
> > > > > > > munmap__ 32 11811 12398 587 18 105%
> > > > > > > mprotect 1 439 465 26 26 106%
> > > > > > > mprotect 2 1659 1745 86 43 105%
> > > > > > > mprotect 4 3747 3889 142 36 104%
> > > > > > > mprotect 8 6755 6969 215 27 103%
> > > > > > > mprotect 16 13748 14144 396 25 103%
> > > > > > > mprotect 32 27827 28969 1142 36 104%
> > > > > > > madvise_ 1 240 262 22 22 109%
> > > > > > > madvise_ 2 366 442 76 38 121%
> > > > > > > madvise_ 4 623 751 128 32 121%
> > > > > > > madvise_ 8 1110 1324 215 27 119%
> > > > > > > madvise_ 16 2127 2451 324 20 115%
> > > > > > > madvise_ 32 4109 4642 534 17 113%
> > > > > > >
> > > > > > > The second test (measuring cpu cycle)
> > > > > > > syscall__ vmas cpu cmseal delta_cpu per_vma %
> > > > > > > munmap__ 1 1790 1890 100 100 106%
> > > > > > > munmap__ 2 2819 3033 214 107 108%
> > > > > > > munmap__ 4 4959 5271 312 78 106%
> > > > > > > munmap__ 8 8262 8745 483 60 106%
> > > > > > > munmap__ 16 13099 14116 1017 64 108%
> > > > > > > munmap__ 32 23221 24785 1565 49 107%
> > > > > > > mprotect 1 906 967 62 62 107%
> > > > > > > mprotect 2 3019 3203 184 92 106%
> > > > > > > mprotect 4 6149 6569 420 105 107%
> > > > > > > mprotect 8 9978 10524 545 68 105%
> > > > > > > mprotect 16 20448 21427 979 61 105%
> > > > > > > mprotect 32 40972 42935 1963 61 105%
> > > > > > > madvise_ 1 434 497 63 63 115%
> > > > > > > madvise_ 2 752 899 147 74 120%
> > > > > > > madvise_ 4 1313 1513 200 50 115%
> > > > > > > madvise_ 8 2271 2627 356 44 116%
> > > > > > > madvise_ 16 4312 4883 571 36 113%
> > > > > > > madvise_ 32 8376 9319 943 29 111%
> > > > > > >
> > > > > >
> > > > > > If I am reading this right, madvise() is affected more than the other
> > > > > > calls? Is that expected or do we need to have a closer look?
> > > > > >
> > > > > The madvise() has a bigger percentage (per_vma %), but it also has a
> > > > > smaller base value (cpu).
> > > >
> > > > Sorry, it's unclear to me what the "vmas" column denotes. Is that how
> > > > many VMAs were created before timing the syscall? If so, then 32 is
> > > > the max that you show here while you seem to have tested with 1000
> > > > VMAs. What is the overhead with 1000 VMAs?
> > >
> > > The vmas column is the number of VMA used in one call.
> > >
> > > For example: for 32 and mprotect(ptr,size), the memory range used in
> > > mprotect has 32 VMAs.
> >
> > Ok, so the 32 here denotes how many VMAs one mprotect() call spans?
> >
> Yes.
>
> > >
> > > It also matters how many memory ranges are in-use at the time of the
> > > test, This is where 1000 comes in. The test creates 1000 memory
> > > ranges, each memory range has 32 vmas, then calls mprotect on the 1000
> > > memory range. (the pseudocode was included in the original email)
> >
> > So, if each range has 32 vmas and you have 1000 ranges then you are
> > creating 32000 vmas? Sorry, your pseudocode does not clarify that. My
> > current understanding is this:
> >
> > for (i = 0; i < 1000, i++)
> > mmap N*1000 areas (N=[1-32])
> > start the sampling
> > for (j = 0; j < 1000, j++)
> > mprotect N areas with one syscall
> > stop and save the sample
> > munmap N*1000 areas
> > calculates all samples.
> >
> > Is that correct?
> >
> Yes, There will be 32000 VMA in the system.
>
> The pseudocode is correct in concept.
> The test implementation is slightly different, it uses mprotect to
> split the memory and make sure the VMAs doesn't merge. For detail,
> the reference [8] of the original email link to the test code.

Ok, thanks for clarifications. I don't think the overhead is high
enough to worry about.
Thanks,
Suren.


>
> -Jeff