Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops

From: David Laight
Date: Mon Jun 09 2025 - 17:19:28 EST

Next message: Guenter Roeck: "Re: [PATCH] hwmon: (asus-ec-sensors) add support for ROG STRIX Z490-F GAMING"
Previous message: Guenter Roeck: "Re: [PATCH] hwmon: (asus-ec-sensors) add ProArt X870E-CREATOR WIFI"
In reply to: Uros Bizjak: "Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 5 Jun 2025 18:47:33 +0200
Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:

> gcc is over eager to use rep movsq/stosq (starts above 40 bytes), which
> comes with a significant penalty on CPUs without the respective fast
> short ops bits (FSRM/FSRS).
>
> Another point is that even uarchs with FSRM don't necessarily have FSRS (Ice
> Lake and Sapphire Rapids don't).
>
> More importantly, rep movsq is not fast even if FSRM is present.

Which architecture is that?
I got exactly the same timings for 'rep movsb' and 'rep movsq' when
I did some tests on Intel cpu going back to Ivy bridge.

I do need to redo them though, I've worked out how to time them
without using mfence/lfence and that should give a reasonable
estimation of the setup cost.
(I can measure the data-dependency of a single divide...)

David

Next message: Guenter Roeck: "Re: [PATCH] hwmon: (asus-ec-sensors) add support for ROG STRIX Z490-F GAMING"
Previous message: Guenter Roeck: "Re: [PATCH] hwmon: (asus-ec-sensors) add ProArt X870E-CREATOR WIFI"
In reply to: Uros Bizjak: "Re: [PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]