Re: [PATCH v7 4/5] rust: add find_bit_benchmark_rust module.

From: Boqun Feng
Date: Fri Apr 25 2025 - 09:45:50 EST


On Fri, Apr 25, 2025 at 02:20:13PM +0200, Burak Emir wrote:
> On Fri, Apr 25, 2025 at 12:31 AM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
> >
> > On Thu, Apr 24, 2025 at 09:48:17AM -0700, Boqun Feng wrote:
> > > On Thu, Apr 24, 2025 at 06:45:33PM +0200, Burak Emir wrote:
> > > > On Wed, Apr 23, 2025 at 6:56 PM Yury Norov <yury.norov@xxxxxxxxx> wrote:
> > > > > So? Can you show your numbers?
> > > >
> > > > For now, I only have numbers that may not be very interesting:
> > > >
> > > > - for find_next_bit, find_next_zero_bit and find_next_zero_bit (sparse):
> > > > 22 ns/iteration in C, 32 ns/iteration in Rust.
> > > >
> > > > - for sparse find_next_bit (sparse):
> > > > 60 ns/iteration in C, 70 ns/iteration in Rust.
> > > >
> > > > This is a VM running nested in a VM. More importantly: the C helper
> > > > method is not inlined.
> > > > So we are likely measuring the overhead (plus the extra bounds checking).
> > > >
> > > > I would like to get cross-language inlining to work with thinLTO to
> > > > have a more realistic comparison.
> > > > However, that is not something that works out of the box.
> > > > I am looking at Gary Guo's patch for this:
> > > > https://lore.kernel.org/all/20250319205141.3528424-1-gary@xxxxxxxxxxx/
> > > > Currently, I get duplicate symbol errors.
> > > >
> > >
> > > You will need to add __rust_helper attribute for the new rust helpers
> > > introduce in your patches. See Gary's patch #2 for example.
> > >
> >
> > Here you go ;-)
> >
> > https://github.com/fbq/linux/tree/rust-inline-bitmap
> >
> > I rebased on the top of rust-next and applied your patches onto it. The
> > last one is the necessary bits that enables helper inlining for the new
> > APIs in your patch. There is also a "TMP" patch in-between, otherwise
> > INLINE_HELPERS won't work, we need to dig more of it. But anyway, it
> > works on my machine (TM):
> >
> > (on x86, in your test function)
> >
> > 000000000028b090 <_RNvNtNtCs3KHxpmQFgFb_6kernel6bitmap5tests40kunit_rust_wrapper_bitmap_set_clear_find>:
> > ...
> > 28b0dd: 48 0f ba 2b 11 btsq $0x11, (%rbx)
> >
> > ^ this is the "b.set_bit(17);" in bitmap_set_clear_find() test.
>
> Thanks Boqun! I had the same state and got things to build now with
> CONFIG_LTO_NONE.
> Your work helped me narrow down the possibilities.
>

Great! Is performance number any different?

> Gary's helper-inlining patch in combination of CONFIG_CLANG_LTO_THIN
> and CONFIG_FIND_BIT_BENCHMARK_RUST=y gives "duplicate symbol" build
> errors.
>
> It does not eve matter here - because _find_next_bit C functions are
> not helpers.
>

Yes, I noticed that, but in theory we should have the same performance
as C because C also make the call-jump to _find_next_bit() instead of
inlining it.

> For cross-language inlining, we would need to change the build system
> similar to Gary's helper-inlining series, but for *all* object files.
> That is what "-flto" does for clang, and "-Clinker-plugin-lto" would
> do for rustc.
> Instead of the hack of emitting a .bc file, we need all .o files to
> contain LLVM bitcode, regardless of whether they come from clang or
> rustc.
>

There are cases where we don't want a full LTO, and so having an option
to get rid of unnecessary calls to the helpers instead of a full LTO is
reasonable IMO. But of course, no reason not to make cross-language
inlining work under LTO.

Regards,
Boqun

> Thanks,
> Burak