Re: what's papered over by set_fs(USER_DS) in amd64 signal delivery?

From: Brian Gerst
Date: Fri Sep 24 2010 - 23:51:34 EST


On Fri, Sep 24, 2010 at 10:48 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Sep 24, 2010 at 10:25:15PM -0400, Brian Gerst wrote:
>> > + ?? ?? ?? ?? ?? ?? ?? __asm__("mov %w0,%%fs ; mov %w0,%%gs":"=r" (seg) :"0" (seg));
>> > + ?? ?? ?? ?? ?? ?? ?? set_fs(seg);
>> > + ?? ?? ?? ?? ?? ?? ?? regs->xds = seg;
>> > + ?? ?? ?? ?? ?? ?? ?? regs->xes = seg;
>> > + ?? ?? ?? ?? ?? ?? ?? regs->xss = seg;
>> > + ?? ?? ?? ?? ?? ?? ?? regs->xcs = USER_CS;
>> > in 2.1.2. ??And that's when we had
>> > ?? ?? ?? ??* fs and gs evicted from pt_regs
>> > ?? ?? ?? ??* fs and gs not saved restored on kernel entry/exit
>> > ?? ?? ?? ??* just introduced set_fs() to start with (that went in 2.1.0)
>> >
>> > A bit before my time, so I'm not sure what's been going on there...
>>
>> I believe it can be safely removed. ÂLooking through the history, the
>> corresponding set_fs() calls were removed from 32-bit by commit
>> b93b6ca3. ÂThis is just an artifact from ancient i386 code where
>> set_fs (which is grossly misnamed now) really did set the %fs
>> register.
>
> Not quite. ÂIf you look at the tree where it has shown up (2.1.2), you'll see
> that
> Â Â Â Âa) by that time it _wasn't_ an assignment to %fs
> Â Â Â Âb) the same patch that has introduced that call there does direct
> assignment to %fs right next to that set_fs(). ÂSee that __asm__ above?
>
> Again, I agree that it almost certainly can be dropped. ÂI really wonder
> about the history, though. ÂIt predates git and bk by far (late 1996).
> Linus, do you have any recollection regarding that stuff?
>

In the beginning, the i386 kernel used a non-flat segmented memory
layout. USER_[CD]S were 3GB segments at base 0, and KERNEL_[CD]S were
1GB segments at base 3GB. This meant that the kernel could not access
userspace addresses without using a fs segment override (%fs was saved
in pt_regs, reloaded with USER_DS on kernel entry, and restored on
kernel exit). You had to reload %fs with KERNEL_DS for the *_user
functions to address the kernel segment.

v2.1.2 introduced the modern flat memory layout with 4GB segments at
base 0. %fs no longer was used for userspace access, so it wasn't
saved in pt_regs or touched in any way until a task switch. Instead
of the hardware enforcing the limit, the check was moved to software.

Originally the signal handler had to set regs->xfs = USER_DS so that
the signal handler had a known state when it ran. That had nothing to
do with the kernel's userspace access mechanism. It was converted to
do both the immediate reloading of the %fs register (since it was no
longer saved in pt_regs and wouldn't get restored on kernel exit), and
to a new set_fs(USER_DS) call which meant something completely
different. That is the origin of the code we are trying to remove
now.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/