External email: Use caution opening links or attachments
On Tue, Jun 10, 2025, at 11:50, Patrisious Haddad wrote:
On 6/10/2025 12:28 PM, Arnd Bergmann wrote:Generally speaking, kcalloc is fairly quick and won't fail here,
void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)Please preserve reverse Christmas tree deceleration.
{
- struct mlx5_async_cmd async_cmd[MAX_ASYNC_CMDS];
+ struct mlx5_async_cmd *async_cmd;
struct ib_ucontext *ucontext = ufile->ucontext;But honestly I'm not sure I like this, the whole point of this patch was
struct ib_device *device = ucontext->device;
struct mlx5_ib_dev *dev = to_mdev(device);
@@ -2678,6 +2678,10 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
int head = 0;
int tail = 0;
+ async_cmd = kcalloc(MAX_ASYNC_CMDS, sizeof(*async_cmd), GFP_KERNEL);
+ if (WARN_ON(!async_cmd))
+ return;
performance optimization for teardown flow, and this function is called
in a loop not even one time.
So I'm really not sure about how much kcalloc can slow it down here, and
it failing is whole other issue.
but it can take some time under memory pressure if it ends up
in memory reclaim.
I'm thinking out-loud here, but theoretically we know stack size andThe thing I'm most interested here is the compile-time warning:
this struct size at compile time , so can we should be able to add some
kind of ifdef check "if (stack_frame_size < struct_size)" skip this
function and maybe print some warning.
(since it is purely optimization function and logically the code will
continue correctly without it - but if it needs to be executed then let
it stay like this and needs a big enough stack - which is most of today
systems anyway) ?
we currently have some configurations that have a very high warning
limit of 2048 bytes or even unlimited, which means that a number
of functions that accidentally use too much stack space (either from
a compiler misoptimization or a programmer error) are missed and
can end up causing problems later. I posted this patch as part of
a larger work to eventually reduce the default warning limit
for those corner cases.
The risk in this particular function to actually overflow is fairly
low since it gets called from sys_close() or __fput(), which
are not nested deeply. I can think of a couple of other ways to
keep your fast path and also build cleanly with a lower warning
limit.
- check which exact configurations actually trigger the high stack
usage and then skip the optimization in those cases. The most
likely causes are CONFIG_KASAN_STACK and CONFIG_KMSAN, both
of which already make the kernel a lot slower.
- reduce MAX_ASYNC_CMDS to always stay under the warning limit, either
picking a lower value unconditionally, or based on the Kconfig
options that trigger it
Can work but not ideal.
- preallocate the array as part of an existing structure, whichever
makes sense here (mlx5_ib_dev maybe?).
Sounds like the hardest option to implement but I have no quarrel with it.
- reorganize the code in some other form to have the stack not
blow the warning limit. As far as I can tell, I only see this
particular one with clang but not gcc, and that often means
it happens because of some particular inlining decisions that
clang takes, and we can force them by adding strategic
__always_inline or noinline annotations that make both compilers
do the same thing.
Arnd