Merge of David's ptrace branch. Summary:
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
AMF_NOREPLY senda() flag
DETAILS
Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
running while modifying its process structure
Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
protocol message
o Detached debugger signals from general signal logic and from being
blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
are pending
o Fixed wait test for tracer, which was returning for children that were
not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG
Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
structure
o Removed T_STOP ptrace request again, as it does not help implementing
debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)
Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
#define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
revealed RS-PM-VFS race condition triangle until VFS is asynchronous
System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
signal set, rather than just the POSIX subset
Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
structure clearer
o Fixed setpriority() being able to put to sleep processes using an
invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there
Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code
THINGS OF POSSIBLE INTEREST
o It should now be possible to run PM at any priority, even lower than
user processes
o No assumptions are made about communication speed between PM and VFS,
although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
Tomas Hruby [Tue, 29 Sep 2009 20:13:41 +0000 (20:13 +0000)]
Mostly a revert of r5306. com.h defines MAX_NR_TASKS value which replaces
NR_TASKS in the endpoint macros. MAX_NR_TASKS defines the maximal number of
kernel tasks. It is unlikely that we will ever need this many tasks as the goal
is not to have such a difference in the future. For now it makes possible to
remove the limiting NR_TASKS from the endpoint code.
Tomas Hruby [Tue, 29 Sep 2009 18:47:56 +0000 (18:47 +0000)]
Removed macros that depend on NOTIFY_FROM from servers and drivers. They
determine the information defined by these macros from the m_source field of the
notify message.
Tomas Hruby [Thu, 24 Sep 2009 16:00:59 +0000 (16:00 +0000)]
ps fix
It removes the no more existing marcos (XPIPE XPOPEN XDOPEN XLOCK XSELECT) and
replaces them with the new ones from servers/vfs/const.h No more dependency on
NR_TASKS macro.
Fixed compilation errors in ps.c and rs/manager.c. The former was fixed by disabling code using no-longer-existant flags and the latter by removing the spurious parameter i from sys_privctl
Tomas Hruby [Tue, 22 Sep 2009 21:48:26 +0000 (21:48 +0000)]
Removed dependency of vfs on NR_TASKS macro
- all macros in consts.h that depend on NR_TASKS replaced by a FP_BLOCKED_ON_*
- fp_suspended removed and replaced by fp_blocked_on. Testing whether a process
is supended is qeual to testing whether fp_blocked_on is FP_BLOCKED_ON_NONE or
not
- fp_task is valid only if fp_blocked_on == FP_BLOCKED_ON_OTHER
- no need of special values that do not colide with valid and special endpoints
since they are not used as endpoints anymore
- suspend only takes FP_BLOCKED_ON_* values not endpoints anymore
- suspend(task) replaced by wait_for(task) which sets fp_task so we remember who
are we waiting for and suspend sets fp_blocked_on to FP_BLOCKED_ON_OTHER to
signal that we are waiting for some other process
- some functions should take endpoint_t instead of int, fixed
Tomas Hruby [Tue, 22 Sep 2009 21:46:47 +0000 (21:46 +0000)]
NOT_REACHABLE macro
- marks code path that should be unreachable (never executed)
- if hit, panics and reports the problem
- the end of main() marked as such. The SMP changes need some magic with stack
switching before the AP can be started as they need to run on the boot stack
before figuring out what is their own stack. As main() uses the boot stack to,
we need to switch to to the stack of BSP before executing the last part of
main() which needs to be in a separate function so we can jump to it.
Therefore restart() won't be the last call in main() which may be confusing.
The macro can/should be used in other such places too.
Tomas Hruby [Tue, 22 Sep 2009 21:45:26 +0000 (21:45 +0000)]
Removed NR_TASKS from macros manipulating endpoint_t
- the magic numbers ANY, NONE and SELF are kept for the compatibility with the
current userspace. It is OK as long as NR_PROCS is greater so they don't
colide with other endpoints
- the 32 bit endpoint_t value is split in half, lower 16 bits for process slot
number and upper half for generation number
- transition to a structured endpoint_t in the future possible
Ben Gras [Mon, 21 Sep 2009 14:49:49 +0000 (14:49 +0000)]
- pages that points to page directory values of all processes,
shared with the kernel, mapped into kernel address space;
kernel is notified of its location. kernel segment size is
increased to make it fit.
- map in kernel and other processes that don't have their
own page table using single 4MB (global) mapping.
- new sanity check facility: objects that are allocated with
the slab allocator are, when running with sanity checking on,
marked readonly until they are explicitly unlocked using the USE()
macro.
- another sanity check facility: collect all uses of memory and
see if they don't overlap with (a) eachother and (b) free memory
- own munmap() and munmap_text() functions.
- exec() recovers from out-of-memory conditions properly now; this
solves some weird exec() behaviour
- chew off memory from the same side of the chunk as where we
start scanning, solving some memory fragmentation issues
- use avl trees for freelist and phys_ranges in regions
- implement most useful part of munmap()
- remap() stuff is GQ's for shared memory
Ben Gras [Mon, 21 Sep 2009 14:48:19 +0000 (14:48 +0000)]
- Introduce some macros for field names, so that endpt, pendpt,
addr and taddr don't have to be defined any more, so that <sys/mman.h>
can be included for proper prototypes of munmap() and friends.
- rename our GETPID to MINIX_GETPID to avoid a name conflict with
other sources
- PM needs its own munmap() and munmap_text() to avoid sending messages
to VM at the startup phase. It *does* want to do that, but only
after initialising. So they're called again with unmap_ok set to 1
later.
- getnuid(), getngid() implementation
Ben Gras [Mon, 21 Sep 2009 14:47:51 +0000 (14:47 +0000)]
- No maximum block size any more.
- If allocation of a new buffer fails, use an already-allocated
unused buffer if available (low memory conditions)
- Allocate buffers dynamically, so memory isn't wasted on wrong-sized
buffers.
- No more _MAX_BLOCK_SIZE.
Ben Gras [Mon, 21 Sep 2009 14:31:52 +0000 (14:31 +0000)]
Primary goal for these changes is:
- no longer have kernel have its own page table that is loaded
on every kernel entry (trap, interrupt, exception). the primary
purpose is to reduce the number of required reloads.
Result:
- kernel can only access memory of process that was running when
kernel was entered
- kernel must be mapped into every process page table, so traps to
kernel keep working
Problem:
- kernel must often access memory of arbitrary processes (e.g. send
arbitrary processes messages); this can't happen directly any more;
usually because that process' page table isn't loaded at all, sometimes
because that memory isn't mapped in at all, sometimes because it isn't
mapped in read-write.
So:
- kernel must be able to map in memory of any process, in its own
address space.
Implementation:
- VM and kernel share a range of memory in which addresses of
all page tables of all processes are available. This has two purposes:
. Kernel has to know what data to copy in order to map in a range
. Kernel has to know where to write the data in order to map it in
That last point is because kernel has to write in the currently loaded
page table.
- Processes and kernel are separated through segments; kernel segments
haven't changed.
- The kernel keeps the process whose page table is currently loaded
in 'ptproc.'
- If it wants to map in a range of memory, it writes the value of the
page directory entry for that range into the page directory entry
in the currently loaded map. There is a slot reserved for such
purposes. The kernel can then access this memory directly.
- In order to do this, its segment has been increased (and the
segments of processes start where it ends).
- In the pagefault handler, detect if the kernel is doing
'trappable' memory access (i.e. a pagefault isn't a fatal
error) and if so,
- set the saved instruction pointer to phys_copy_fault,
breaking out of phys_copy
- set the saved eax register to the address of the page
fault, both for sanity checking and for checking in
which of the two ranges that phys_copy was called
with the fault occured
- Some boot-time processes do not have their own page table,
and are mapped in with the kernel, and separated with
segments. The kernel detects this using HASPT. If such a
process has to be scheduled, any page table will work and
no page table switch is done.
Major changes in kernel are
- When accessing user processes memory, kernel no longer
explicitly checks before it does so if that memory is OK.
It simply makes the mapping (if necessary), tries to do the
operation, and traps the pagefault if that memory isn't present;
if that happens, the copy function returns EFAULT.
So all of the CHECKRANGE_OR_SUSPEND macros are gone.
- Kernel no longer has to copy/read and parse page tables.
- A message copying optimisation: when messages are copied, and
the recipient isn't mapped in, they are copied into a buffer
in the kernel. This is done in QueueMess. The next time
the recipient is scheduled, this message is copied into
its memory. This happens in schedcheck().
This eliminates the mapping/copying step for messages, and makes
it easier to deliver messages. This eliminates soft_notify.
- Kernel no longer creates a page table at all, so the vm_setbuf
and pagetable writing in memory.c is gone.
Minor changes in kernel are
- ipc_stats thrown out, wasn't used
- misc flags all renamed to MF_*
- NOREC_* macros to enter and leave functions that should not
be called recursively; just sanity checks really
- code to fully decode segment selectors and descriptors
to print on exceptions
- lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1
Ben Gras [Mon, 21 Sep 2009 14:25:54 +0000 (14:25 +0000)]
- tty: only report unrecognized scancodes once; forget about
remembering the origin and cursor position as that feature didn't
really work properly anyway
- tty: map in video and font memory using a vm call, access it from C,
thereby eliminating pesky weird segment calls and assembly to access it,
and unbreaks loadfont (Roman Ignatov)
- bios_wini: fix bios_wini by allocating a <1MB buffers for it
- memory: preallocate ramdisk, makes it a bit faster (and doesn't
fail halfway if you allocate a huge one)
- floppy: use <1MB buffer
- ramdisk proto: because of the 2x1 page reservations, binaries
got a little fatter and didn't fit on the ramdisk any more.
increase it.
Ben Gras [Mon, 21 Sep 2009 14:24:29 +0000 (14:24 +0000)]
- added 'datasizes' script that shows you the size allocated
for each symbol, usually answering those "why is does my binary have
such a lot of BSS" questions.
- stop binpackage looking in /var/spool for package files.
- let makewhatis recognize .Sh as heading name
- setup, fsck, df: allow >4kB block sizes painlessly
- mkfs: new #-of-inodes heuristic that depends on kb, not
on blocks; i've run out of inodes on my /usr
- asmconv: don't silently truncate .aligns to 16 bytes
- ipc* commands for shared memory support
Ben Gras [Mon, 21 Sep 2009 14:23:47 +0000 (14:23 +0000)]
- remove unused bootdelay feauture
- only print a line for every boot process if 'verbose' variable set to
nonzero; reason: with serial output, the long output
significantly slows down frequent reboots, and causes 'scroll damage'
that in some cases is pretty bad. also the verbose output doesn't tell
you the one thing you might want to know about a process: how much memory
is it using? or how much memory is everything using?
- short format does print out total memory allocated for processes
Ben Gras [Mon, 21 Sep 2009 14:23:10 +0000 (14:23 +0000)]
- VM_KERN_NOPAGEZERO feature is gone
- sys_getbiosbuffer feature is gone (from kernel; available from vm)
- bump version number because munmap() calls that newly compiled binaries
will do trigger an ugly (but harmless) error message in older VM's
- some new VM calls and flags, the new IPC calls
- some new CR0 register bits
- added files for shared memory
Tomas Hruby [Tue, 15 Sep 2009 10:01:06 +0000 (10:01 +0000)]
Some clean up of the segment selectors macros
- [ABCD]_INDEX are not used anywhere
- value of *_SELECTOR is now calculated using the *_INDEX value so changing the
index does not break the selector
- TSS is now the last of the global selectors. There will be TSS per CPU on SMP
and the number will vary depending on the maximal supported number of CPUs
configured
Tomas Hruby [Tue, 15 Sep 2009 09:57:22 +0000 (09:57 +0000)]
proc_addr() returns address based on location in proc array
- pproc_addr is not neccessary to get the address of a process if we know its
number
- local proc variables in system calls implementation (sys_task) conflicts with
the global proc array of all process, therefore the variable were renamed to
proc_nr as they hold the process number
Disabled check in test 28 which hard links a directory; this is nott required by POSIX and not supported (currently) by MINIX. Also corrected total number of tests.
Allow setuid tests 11 and 33 to run. The former still fails (but now with a meaningful error) while the latter succeeds. Only 2 tests are left borken on default MINIX, namely 11 and 28.