Re: [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads
From: Dev Jain
Date: Fri Jun 27 2025 - 23:50:42 EST
On 27/06/25 9:00 pm, Lorenzo Stoakes wrote:
+cc Vlata
On Fri, Jun 27, 2025 at 04:09:16PM +0530, siddhartha@xxxxxxxx wrote:
Hi all,
I wanted to share validation data from a Hugging Face-based AI inferencing
workload,
which was significantly impacted by the THP alignment logic introduced in
commit efa7df3e3bb5.
Using transformer models with dynamic input lengths on Intel Xeon (Cooper
Lake),
we observed up to a 3200% throughput improvement after applying the patch
from Oct 2024:
mm: limit THP alignment of anonymous mappings to PMD-aligned sizes
All congratulations are owed to Vlastimil Babka for doing this, cc'd :)
I gather he enjoys novelty beer mugs as tokens of thanks ;)
I was wondering how the change can get us such a big optimization - the
alignment causes us to gain at most 1 extra PMD-THP mapping. Is there
something else I am missing?
I ask because when I was reading the code I was thinking whether a similar
change can be done for mTHPs.
Metrics:
- Model: BERT-base
- Inference engine: Transformers + ONNX Runtime
- Kernel: 6.6 vs patched 6.6.8
- Batch size: 8-32, input length: 64-512 tokens
- Metric: inference throughput (samples/sec)
Thanks for the fix -- this change had real impact on a production-relevant
workload.
Best Regards,
Siddhartha Sharma
ISV @ Kenip
Solution Link: https://www.intel.com/content/www/us/en/partner/showcase/offering/a5bHo00000045YUIAY/deadlock-clearance.html