Re: [BUG] Race between policy reload sidtab conversion and live conversion

From: Ondrej Mosnacek
Date: Mon Mar 01 2021 - 09:38:04 EST


On Sat, Feb 27, 2021 at 3:35 AM Hillf Danton <hdanton@xxxxxxxx> wrote:
> On Fri, 26 Feb 2021 12:19:35 +0100 Ondrej Mosnacek wrote:
> > On Fri, Feb 26, 2021 at 5:08 AM Hillf Danton <hdanton@xxxxxxxx> wrote:
> > > On Thu, 25 Feb 2021 20:06:45 -0500 Paul Moore wrote:
> > > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> > > > > After the switch to RCU, we now have:
> > > > > 1. Start live conversion of new entries.
> > > > > 2. Convert existing entries.
> > > > > 3. RCU-assign the new policy pointer to selinux_state.
> > > > > [!!! Now actually both old and new sidtab may be referenced by
> > > > > readers, since there is no synchronization barrier previously provided
> > > > > by the write lock.]
> > > > > 4. Wait for synchronize_rcu() to return.
> > > > > 5. Now only the new sidtab is visible to readers, so the old one can
> > > > > be destroyed.
> > > > >
> > > > > So the race can happen between 3. and 5., if one thread already sees
> > > > > the new sidtab and adds a new entry there, and a second thread still
> > > > > has the reference to the old sidtab and also tires to add a new entry;
> > > > > live-converting to the new sidtab, which it doesn't expect to change
> > > > > by itself. Unfortunately I failed to realize this when reviewing the
> > > > > patch :/
> > > >
> > > > It is possible I'm not fully understanding the problem and/or missing
> > > > an important detail - it is rather tricky code, and RCU can be very
> > > > hard to reason at times - but I think we may be able to solve this
> > > > with some lock fixes inside sidtab_context_to_sid(). Let me try to
> > > > explain to see if we are on the same page here ...
> > > >
> > > > The problem is when we have two (or more) threads trying to
> > > > add/convert the same context into a sid; the task with new_sidtab is
> > > > looking to add a new sidtab entry, while the task with old_sidtab is
> > > > looking to convert an entry in old_sidtab into a new entry in
> > > > new_sidtab. Boom.
> > > >
> > > > Looking at the code in sidtab_context_to_sid(), when we have two
> > > > sidtabs that are currently active (old_sidtab->convert pointer is
> > > > valid) and a task with old_sidtab attempts to add a new entry to both
> > > > sidtabs it first adds it to the old sidtab then it also adds it to the
> > > > new sidtab. I believe the problem is that in this case while the task
> > > > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which
> > > > allows it to race with tasks that already see only new_sidtab. I
> > > > think adding code to sidtab_context_to_sid() which grabs the
> > > > new_sidtab->lock when adding entries to the new_sidtab *should* solve
> > > > the problem.
> > > >
> > > > Did I miss something important? ;)
> > >
> > > If the convert pointer can be derefered without lock, we can opt to
> > > convert context after building sidtab with the risk of AB BA deadlock
> > > cut. Below is the minimum change I can think of along your direction.
> >
> > We could fix this a bit more easily by just having a shared spinlock
> > for both (well, *all*) sidtabs. Yes, we'd need to have it all the way
>
> Looking forward to reading how to add another BKL,

This is not even remotely comparable to the Big Kernel Lock... 99.9%
of the time, there will be just one sidtab, otherwise there will be
just two for a short time during the policy reload, which need to be
updated in sync anyway.

>
> > up in selinux_state and pass it through to sidtab_init(), but IMHO
> > that's less bad than trying to get it right with two locks.
>
> instead of a nearly pure code churn in order to walk around deadlock.

I wish it were that simple, but it won't work this way. Consider a
scenario like this:
1. Thread A acquires the old policy pointer, encounters a new context,
adds it to the old sidtab as X, but doesn't start adding it to the new
one yet.
2. Thread B acquires the new policy pointer, encounters a different
new context and adds it to the new sidtab as X.
[Now you end up with two different contexts assigned to the same SID,
depending on the policy]
3. Now thread A continues to add the context to the new sidtab, but
since it reuses the count variable from before, it overwrites the
entry for X.
(And even if you fixed that, re-searched the new sidtab, and added the
new context to the end, you'd end up with the context having different
SIDs under the old vs. new policy, which is again a messy situation.)

You see, moving the sidtab code around can get you in trouble and
that's why I'm advocating for using a common lock - it's much easier
to prove that such change doesn't create new bugs. And as I said
above, merging the locks won't really have performance consequences,
so I still believe it to be the better choice here.

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.