RE: Kernel Internals

Hugo Van den Berg (hbe@cypres.nl)
Mon, 3 Feb 1997 10:59:08 +0100 ()


On Thu, 30 Jan 1997, James Mohr wrote:

>
> >In article <01BC07E5.06DD4180@jmohr.blitz.de> James Mohr <jimmo@blitz.net> writes:
> > First is where the PID comes from. I understood it to be the entry
> >in the task[] array. That matches the way other *NIXes do it. That
> >is, the PID is just the offset into the process table (task[]
> > array). However, I cannot find anything definitive that says this.
>
> >It is unlikely that this is the case, since PIDs run the range from 1
> >to 32767, or so. Most likely, the PID is an entry in the process
> >structure in the kernel. If you really want to track it down, look
> >through the code that implements fork() (kernel/fork.c), as that is
> >the only way that new processes are created.
>
> What does PID running in a specific range have to do with it? SCO has PIDs that run in a specific range *and* the size of the process table grows dynamically *and* the PID is the slot number in the process table. Also,
> isn't the task[]array the "process structure in the kernel." Looking through the kernel source, I only see references to tasks and not to processes. So, what structure is the "process structure"?

What version of SCO are we talking about. I have several Open Server 3 and
5 machines running. They each have a limit of between 100 and 300
processes, depending on the machine size and requirements. The PID's on
these machines run up to 32767 or therabouts though. I know for certain
that my process tables are not that big. Besides, if I want more processes
I need to reconfigure and relink the kernel, an then reboot.

> If not an offset in an array, but rather some arbitrary number, then the kernel (or whatever) has to search an average of half the table *each time* before it finds the right process when:
>
> - sending a kill to a particular processes
> - doing a ps -p <PID>
> - passing the exit value of the process back to the parent
> - Any time you need information about a speficic PID.

Yes, but this is in non-pagable memory, and only a short list. If you sort
the processlist by PID you can do a binary search, which means searching
in the order of log(n), not n/2.

> > I am unclear about the differences between bdflush and update. I
> > understand that bdflush is "part" of update and per the man-page is
> > "called by a user without superuser priveledges." Do I take this to
> > mean that processes not owned by root call bdflush and the others
> > call update?
>
> >I don't understand it fully myself, but it is all part of a system
> >process that you must have (in order to get changed buffers flushed
> >back to disk regularly) but otherwise you don't have to worry about
> >it.
>
> Sorry, Dale, the question was not whether I should worry about them. I
> know about buffer flushing and why it is done. SCO just has bdflush, not
> update.
> What is so different about the the way Linux handle buffer flushing that
> requires two daemons?

As far as I know SCO does a lot of stuff in kernel space that can easily
be handled in userspace. Hence the 2 deamons.

> > Some one said that "Clearing the process table slot of an exiting
> > process is not the responsability of init, but of the parent. If
> > all the forefathers have died, init will take over." To me that
> > says that if I write a simple "Hello, World!" program, it will have
> > the code to clean up the process table. (maybe in a dynamically
> > linked library) To me that is system work and the parent process
> > should do it.
>
> >The process table slot is cleared when the "parent" process does a
> >wait() call which gets the information regarding the exiting of the
> >process. If a process's parent exists, all of its child processes
> >automagically become children of process 1, which is init. Since init
> >almost always is doing a wait() (look at sys_wait4() in
> >kernel/exit.c), this happens promptly after the process exits.
>
> Here again, that wasn't the question. I know when the process table
> entry is cleared, I know what is kept in the the process table entry
> after the process dies, and I know what happens when there is no parent
> waiting on the child. So, to put the question as clear as I can:
>
> Is it true that the parent process is responsible for clearing the
> process table of child processes or is this done by init or some other
> process, kernel function, whatever?

Normaly the kernel handles this after wait() completes. I expect Linux to
do this as well

> >When I do a ps, I see that more than open process is waiting on
> >read_chan. No problem. What annoys me is that when I look at the
> >numeric output for the WCHAN, they are all the same one. Other
> >UNIXes will have a different WCHAN for each tty that is being
> > waited upon. Therefore, the number here is different.
>
> >That is a statement, not a question. There is probably no reason for
> >WCHAN values to be handled similarly in different Unixes.
>
> Wait channels are essentially the address of the routine that the
> process was at when it went to sleep. The values *are* handled the same
> in different Unixes as all UNIX (that I know of) use wait channels and
> they serve the same function. The question is "why would the WCHAN be
> the same for all processes?"

I think because Linux makes heavy use of shared code where other Unixes
load the same code over and over again. I know SCO does this, for example,
if I add a serial board my kernel increases substantialy (and I need to
reconfigure/recompile/reboot 8-( ). On Linux the same code can handel 1 or
64
serial ports, with the same code and the same configuration. So if you use
the same code for all your tty's, the wait-for-the-next-char-typed
routine, which is the one most tty's wait for most, will be the wchan for
all (or at least most) tty's.

> In SCO, the virtual memory of each process between 3Gb-4Gb is for
> portions of the kernel that the process is using. So, when I am using a
> device, I have a particular driver loaded that ends up in the 3-4Gb
> range. Although two (or more) processes are waiting on the same event
> (i.e. input from the keyboard) and the WCHAN maps to the same function,
> the numeric WCHAN value is different. In Linux, both the numeric value
> and the address mapping is the same.

Memory mapping in SCO and Linus is quite different. One of the drawbacks
of the SCO method is that you can't equip a SCO machine with more than 3GB
of physical memory. Why did they reserve a full GB anyway? This is the
original mistake IBM made with the PC (640-1M reserved) which is still
bugging us. The again, if I put down a 4GB machine I won't equip it with
SCO anyway ;-)

> This brings up the question of how kernel address space is mapped into
> the user's process space.

I don't know, sorry.

--------------------------------------
Hugo Van den Berg - hbe@cypres.nl
Phone - +31 (0)30 - 60 25 400
Fax - +31 (0)30 - 60 50 799
--------------------------------------