Re: Flush the hold queue fall into an infinite loop.

From: cuigaosheng
Date: Thu Jan 13 2022 - 20:22:22 EST


I want to stop droping the logs into audit_hold_queue when the auditd is abnormal.it
seems that this modification goes against the design intent of audit_hold_queue. its
effect is similar to removing the audit_hold_queue.

diff --git a/kernel/audit.c b/kernel/audit.c
index 2a38cbaf3ddb..a8091b1a6587 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -748,6 +748,7 @@ static int kauditd_send_queue(struct sock *sk, u32 portid,
                                        (*err_hook)(skb);
                                if (rc == -EAGAIN)
                                        rc = 0;
+                               audit_default = AUDIT_OFF;
                                /* continue to drain the queue */
                                continue;
                        } else
@@ -755,6 +756,7 @@ static int kauditd_send_queue(struct sock *sk, u32 portid,
                } else {
                        /* skb sent - drop the extra reference and continue */
                        consume_skb(skb);
+                       audit_default = audit_enabled;
                        failed = 0;
                }
        }

在 2022/1/13 23:22, Paul Moore 写道:
On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <cuigaosheng1@xxxxxxxxxx> wrote:
When we add "audit=1" to the cmdline, kauditd will take up 100%
cpu resource.As follows:

configurations:
auditctl -b 64
auditctl --backlog_wait_time 60000
auditctl -r 0
auditctl -w /root/aaa -p wrx
shell scripts:
#!/bin/bash
i=0
while [ $i -le 66 ]
do
touch /root/aaa
let i++
done
mandatory conditions:

add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).

As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
an infinite loop.

713 static int kauditd_send_queue(struct sock *sk, u32 portid,
714 struct sk_buff_head *queue,
715 unsigned int retry_limit,
716 void (*skb_hook)(struct sk_buff *skb),
717 void (*err_hook)(struct sk_buff *skb))
718 {
719 int rc = 0;
720 struct sk_buff *skb;
721 unsigned int failed = 0;
722
723 /* NOTE: kauditd_thread takes care of all our locking, we just use
724 * the netlink info passed to us (e.g. sk and portid) */
725
726 while ((skb = skb_dequeue(queue))) {
727 /* call the skb_hook for each skb we touch */
728 if (skb_hook)
729 (*skb_hook)(skb);
730
731 /* can we send to anyone via unicast? */
732 if (!sk) {
733 if (err_hook)
734 (*err_hook)(skb);
735 continue;
736 }
737
738 retry:
739 /* grab an extra skb reference in case of error */
740 skb_get(skb);
741 rc = netlink_unicast(sk, skb, portid, 0);
742 if (rc < 0) {
743 /* send failed - try a few times unless fatal error */
744 if (++failed >= retry_limit ||
745 rc == -ECONNREFUSED || rc == -EPERM) {
746 sk = NULL;
747 if (err_hook)
748 (*err_hook)(skb);
749 if (rc == -EAGAIN)
750 rc = 0;
751 /* continue to drain the queue */
752 continue;
753 } else
754 goto retry;
755 } else {
756 /* skb sent - drop the extra reference and continue */
757 consume_skb(skb);
758 failed = 0;
759 }
760 }
761
762 return (rc >= 0 ? 0 : rc);
763 }

When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
fall into an infinite loop.
I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
into kauditd_rehold_skb when the auditd is abnormal?
Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
looking into it now. I'll report back when I have a better idea of
the problem and a potential fix.