Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

From: Jarkko Sakkinen
Date: Tue Jan 31 2017 - 08:32:18 EST


On Mon, Jan 30, 2017 at 02:13:08PM -0800, James Bottomley wrote:
> On Mon, 2017-01-30 at 23:58 +0200, Jarkko Sakkinen wrote:
> > On Mon, Jan 30, 2017 at 08:04:55AM -0800, James Bottomley wrote:
> > > On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> > > > On 1/27/2017 5:04 PM, James Bottomley wrote:
> > > >
> > > > > > Beware the nasty corner case:
> > > > > >
> > > > > > - Application asks for a session and gets 02000000
> > > > > >
> > > > > > - Time elapses and 02000000 gets forcibly flushed
> > > > > >
> > > > > > - Later, app comes back, asks for a second session and again
> > > > > > gets
> > > > > > 02000000.
> > > > > >
> > > > > > - App gets very confused.
> > > > > >
> > > > > > May it be better to close the connection completely, which
> > > > > > the
> > > > > > application can detect, than flush a session and give this
> > > > > > corner
> > > > > > case?
> > > > >
> > > > > if I look at the code I've written, I don't know what the
> > > > > session
> > > > > number is, I just save sessionHandle in a variable for later
> > > > > use
> > > > > (lets say to v1). If I got the same session number returned at
> > > > > a
> > > > > later time and placed it in v2, all I'd notice is that an
> > > > > authorization using v1 would fail. I'm not averse to killing
> > > > > the
> > > > > entire connection but, assuming you have fallback, it might be
> > > > > kinder simply to ensure that the operations with the reclaimed
> > > > > session fail (which is what the code currently does).
> > > >
> > > > My worry is that this session failure cannot be detected by the
> > > > application. An HMAC failure could cause the app to tell a user
> > > > that
> > > > they entered the wrong password. Misleading. On the TPM, it
> > > > could
> > > > trigger the dictionary attack lockout. For a PIN index, it could
> > > > consume a failure count. Killing a policy session that has e.g.,
> > > > a
> > > > policy signed term could cause the application to go back to some
> > > > external entity for another authorization signature.
> > > >
> > > > Let's go up to the stack. What's the attack?
> > > >
> > > > If we're worried about many simultaneous applications (wouldn't
> > > > that
> > > > be wonderful), why not just let startauthsession fail? The
> > > > application can just retry periodically.
> > >
> > > How in that scenario do we ensure that a session becomes available?
> > > Once that's established, there's no real difference between
> > > retrying
> > > the startauthsession in the kernel when we know the session is
> > > available and forcing userspace to do the retry except that the
> > > former
> > > has a far greater chance of success (and it's only about 6 lines of
> > > code).
> > >
> > > > Just allocate them in triples so there's no deadlock.
> > >
> > > Is this the application or the kernel? If it's the kernel, that
> > > adds a
> > > lot of complexity.
> > >
> > > > If we're worried about a DoS attack, killing a session just helps
> > > > the
> > > > attacker. The attacker can create a few connections and spin on
> > > > startauthsession, locking everyone out anyway.
> > >
> > > There are two considerations here: firstly we'd need to introduce a
> > > mechanism to "kill" the connection. Probably we'd simply error
> > > every
> > > command on the space until it was closed. The second is which
> > > scenario
> > > is more reasonable: Say the application simply forgot to flush the
> > > session and will never use it again. Simply reclaiming the session
> > > would produce no effect at all on the application in this scenario.
> > > However, I have no data to say what's likely.
> > >
> > > > ~~
> > > >
> > > > Also, let's remember that this is a rare application. Sessions
> > > > are
> > > > only needed for remote access (requiring encryption, HMAC or
> > > > salt),
> > > > or policy sessions.
> > >
> > > This depends what your threat model is. For ssh keys, you worry
> > > that
> > > someone might be watching, so you use HMAC authority even for a
> > > local
> > > TPM. In the cloud, you don't quite know where the TPM is, so again
> > > you'd use HMAC sessions ... however, in both use cases the sessions
> > > should be very short lived.
> > >
> > > > ~~
> > > >
> > > > Should the code also reserve a session for the kernel? Mark it
> > > > not
> > > > kill'able?
> > >
> > > At the moment, the kernel doesn't use sessions, so let's worry
> > > about
> > > that problem at the point it arises (if it ever arises).
> > >
> > > James
> >
> > It does. My trusted keys implementation actually uses sessions.
>
> But as I read the code, I can't find where the kernel creates a
> session. It looks like the session and hmac are passed in as option
> arguments, aren't they?

Yes. Sorry, I mixed up things.

> > I'm kind dilating to an opinion that we would leave this commit out
> > from the first kernel release that will contain the resource manager
> > with similar rationale as Jason gave me for whitelisting: get the
> > basic stuff in and once it is used with some workloads whitelisting
> > and exhaustion will take eventually the right form.
> >
> > How would you feel about this?
>
> As long as we get patch 1/2 then applications using sessions will
> actually work with spaces, so taking more time with 2/2 is fine by me.
>
> James

1/2 contains code that with a few iterations it is in the form that I'm
able to merge it.

With 2/2 I'm not saying it is wrong approach but I cannot yet say that
I'm confident that it would be the best approach.

I think that the transient object and infrastructure stuff that is
already in the patch set and 1/2 session is the subset of commits where
we can be fairly confident that we are doing the right thing.

I'll start preparing a patch set with this content without RFC tag.

/Jarkko