Re: [5.2 REGRESSION] Generic vDSO breaks seccomp-enabled userspace on i386

From: Andy Lutomirski
Date: Mon Jul 22 2019 - 19:47:52 EST


On Mon, Jul 22, 2019 at 4:28 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> On Mon, Jul 22, 2019 at 12:17:16PM -0700, Andy Lutomirski wrote:
> > On Mon, Jul 22, 2019 at 11:39 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Jul 22, 2019 at 08:31:32PM +0200, Thomas Gleixner wrote:
> > > > On Mon, 22 Jul 2019, Kees Cook wrote:
> > > > > Just so I'm understanding: the vDSO change introduced code to make an
> > > > > actual syscall on i386, which for most seccomp filters would be rejected?
> > > >
> > > > No. The old x86 specific VDSO implementation had a fallback syscall as
> > > > well, i.e. clock_gettime(). On 32bit clock_gettime() uses the y2038
> > > > endangered timespec.
> > > >
> > > > So when the VDSO was made generic we changed the internal data structures
> > > > to be 2038 safe right away. As a consequence the fallback syscall is not
> > > > clock_gettime(), it's clock_gettime64(). which seems to surprise seccomp.
> > >
> > > Okay, it's didn't add a syscall, it just changed it. Results are the
> > > same: conservative filters suddenly start breaking due to the different
> > > call. (And now I see why Andy's alias suggestion would help...)
> > >
> > > I'm not sure which direction to do with this. It seems like an alias
> > > list is a large hammer for this case, and a "seccomp-bypass when calling
> > > from vDSO" solution seems too fragile?
> > >
> >
> > I don't like the seccomp bypass at all. If someone uses seccomp to
> > disallow all clock_gettime() variants, there shouldn't be a back door
> > to learn the time.
> >
> > Here's the restart_syscall() logic that makes me want aliases: we have
> > different syscall numbers for restart_syscall() on 32-bit and 64-bit.
> > The logic to decide which one to use is dubious at best. I'd like to
> > introduce a restart_syscall2() that is identical to restart_syscall()
> > except that it has the same number on both variants.
>
> I've built a straw-man for this idea... but I have to say I don't
> like it. This can lead to really unexpected behaviors if someone
> were to have differing filters for the two syscalls. For example,
> let's say someone was doing a paranoid audit of 2038-unsafe clock usage
> and marked clock_gettime() with RET_KILL and marked clock_gettime64()
> with RET_LOG. This aliasing would make clock_gettime64() trigger with
> RET_KILL...

This particular issue is solvable:

> + /* Handle syscall aliases when result is not SECCOMP_RET_ALLOW. */
> + if (unlikely(action != SECCOMP_RET_ALLOW)) {
> + int alias;
> +
> + alias = seccomp_syscall_alias(sd->arch, sd->nr);
> + if (unlikely(alias != -1)) {
> + /* Use sd_local for an aliased syscall. */
> + if (sd != &sd_local) {
> + sd_local = *sd;
> + sd = &sd_local;
> + }
> + sd_local.nr = alias;
> +
> + /* Run again, with the alias, accepting the results. */
> + filter_ret = seccomp_run_filters(sd, &match);
> + data = filter_ret & SECCOMP_RET_DATA;
> + action = filter_ret & SECCOMP_RET_ACTION_FULL;

How about:

new_data = ...;
new_action = ...;
if (new_action == SECCOMP_RET_ALLOW) {
data = new_data;
action = new_action;
}

It might also be nice to allow a filter to say "hey, I want to set
this result and I do *not* want compatibility aliases applied", but
I'm not quite sure how to express that.

I don't love this whole concept, but I also don't have a better idea.


--Andy