Tomas Hruby [Tue, 9 Feb 2010 15:20:09 +0000 (15:20 +0000)]
Removal of the system task
* Userspace change to use the new kernel calls
- _taskcall(SYSTASK...) changed to _kernel_call(...)
- int 32 reused for the kernel calls
- _do_kernel_call() to make the trap to kernel
- kernel_call() to make the actuall kernel call from C using
_do_kernel_call()
- unlike ipc call the kernel call always succeeds as kernel is
always available, however, kernel may return an error
* Kernel side implementation of kernel calls
- the SYSTEm task does not run, only the proc table entry is
preserved
- every data_copy(SYSTEM is no data_copy(KERNEL
- "locking" is an empty operation now as everything runs in
kernel
- sys_task() is replaced by kernel_call() which copies the
message into kernel, dispatches the call to its handler and
finishes by either copying the results back to userspace (if
need be) or by suspending the process because of VM
- suspended processes are later made runnable once the memory
issue is resolved, picked up by the scheduler and only at
this time the call is resumed (in fact restarted) which does
not need to copy the message from userspace as the message
is already saved in the process structure.
- no ned for the vmrestart queue, the scheduler will restart
the system calls
- no special case in do_vmctl(), all requests remove the
RTS_VMREQUEST flag
Tomas Hruby [Tue, 9 Feb 2010 15:15:45 +0000 (15:15 +0000)]
copy_msg_from_user() and copy_msg_to_user()
- copies a mesage from/to userspace without need of translating
addresses
- the assumption is that the address space is installed, i.e. ldt and
cr3 are loaded correctly
- if a pagefault or a general protection occurs while copying from
userland to kernel (or vice versa) and error is returned which gives
the caller a chance to respond in a proper way
- error happens _only_ because of a wrong user pointer if the function
is used correctly
- if the prerequisites of the function do no hold, the function will
most likely fail as the user address becomes random
Tomas Hruby [Tue, 9 Feb 2010 15:13:52 +0000 (15:13 +0000)]
Early address space switch
- switch_address_space() implements a switch of the user address space
for the destination process
- this makes memory of this process easily accessible, e.g. a pointer
valid in the userspace can be used with a little complexity to
access the process's memory
- the switch does not happed only just before we return to userspace,
however, it happens right after we know which process we are going
to schedule. This happens before we start processing the misc flags
of this process so its memory is available
- if the process becomes not runnable while processing the mics flags
we pick a new process and we switch the address space again which
introduces possibly a little bit more overhead, however, it is
hopefully hidden by reducing the overheads when we actually access
the memory
Tomas Hruby [Tue, 9 Feb 2010 15:12:20 +0000 (15:12 +0000)]
System task initialization moved to main()
- the system task initialization code does not really need to be part
of the system task process. An earlier initialization in kernel is
cleaner as it does not only initialize the syscalls but also irq
hooks etc.
Fixes for truncate system calls:
- VFS: check for negative sizes in all truncate calls
- VFS: update file size after truncating with fcntl(F_FREESP)
- VFS: move pos/len checks for F_FREESP with l_len!=0 from FS to VFS
- MFS: do not zero data block for small files when fully truncating
- MFS: do not write out freed indirect blocks after freeing space
- MFS: make truncate work correctly with differing zone/block sizes
- tests: add new test50 for truncate call family
Ben Gras [Wed, 3 Feb 2010 13:29:14 +0000 (13:29 +0000)]
small asmconv cleanups.
- put asmconv in /usr/bin so it can be invoked without absolute path
- make it ignore .end in gnu output mode so that it can be invoked
without '|| true' in the gnu lib makefiles and it doesn't produce the
messy error message
Statistical profiling fixes:
- PM: get rid of umap warning
- sprofalyze.pl: update with recently added servers and drivers
- sprofalyze.pl: properly truncate process names for sample matching
Tomas Hruby [Wed, 3 Feb 2010 09:04:48 +0000 (09:04 +0000)]
This patch removes the global variables who_p and who_e from the
kernel (sys task). The main reason is that these would have to become
cpu local variables on SMP. Once the system task is not a task but a
genuine part of the kernel there is even less reason to have these
extra variables as proc_ptr will already contain all neccessary
information. In addition converting who_e to the process pointer and
back again all the time will be avoided.
Although proc_ptr will contain all important information, accessing it
as a cpu local variable will be fairly expensive, hence the value
would be assigned to some on stack local variable. Therefore it is
better to add the 'caller' argument to the syscall handlers to pass
the value on stack anyway. It also clearly denotes on who's behalf is
the syscall being executed.
This patch also ANSIfies the syscall function headers.
Last but not least, it also fixes a potential bug in virtual_copy_f()
in case the check is disabled. So far the function in case of a
failure could possible reuse an old who_p in case this function had
not been called from the system task.
virtual_copy_f() takes the caller as a parameter too. In case the
checking is disabled, the caller must be NULL and non NULL if it is
enabled as we must be able to suspend the caller.
Fixed a number of complaints about missing return statements.
Some cases were fixed by declaring the function void, others were fixed
by adding a return <value> statement, thereby avoiding potentially
incorrect behavior (usually in error handling).
Some enum correctness in boot.c.
Removed some uses of uninitialized variables in update.c, presumably remnands of old color support.
Fixed a few cases where free-ed memory blocks were subsequently read.
Removed some unused variables, #includes, other small cleanup.
Thomas Veerman [Thu, 21 Jan 2010 09:32:15 +0000 (09:32 +0000)]
- Fix dangling symlink regression
- Make open(2) more POSIX compliant
- Add a test case for dangling symlinks and open() syscall with O_CREAT and
O_EXCL on a symlink.
- Update open(2) man page to reflect change.
Removed unused code in the ethernet driver that was left from an old implementation
Removed/rewritten the use of uninitialized variables in error messages.
Ben Gras [Mon, 18 Jan 2010 14:10:04 +0000 (14:10 +0000)]
Fix to make making a bootable cd possible again.
ow that the image has grown beyond the 1.44M that fits on a floppy.
(previously, the floppy emulation mode was used for cd's.)
the boot cd now uses 'no emulation mode,' where an image is provided on
the cd that is loaded and executed directly. this is the boot monitor.
in order to make this work (the entry point is the same as where the
image is loaded, and the boot monitor needs its a.out header too) and
keep compatability with the same code being used for regular booting, i
prepended 16 bytes that jumps over its header so execution can start
there.
to be able to read the CD (mostly in order to read the boot image),
boot has to use the already present 'extended read' call, but address
the CD using 2k sectors.
Tomas Hruby [Sat, 16 Jan 2010 20:53:55 +0000 (20:53 +0000)]
NMI watchdog is an awesome feature for debugging locked up kernels.
There is not that much use for it on a single CPU, however, deadlock
between kernel and system task can be delected. Or a runaway loop.
If a kernel gets locked up the timer interrupts don't occure (as all
interrupts are disabled in kernel mode). The only chance is to
interrupt the kernel by a non-maskable interrupt.
This patch generates NMIs using performance counters. It uses the most
widely available performace counters. As the performance counters are
highly model-specific this patch is not guaranteed to work on every
machine. Unfortunately this is also true for KVM :-/ On the other
hand adding this feature for other models is not extremely difficult
and the framework makes it hopefully easy enough.
Depending on the frequency of the CPU an NMI is generated at most
about every 0.5s If the cpu's speed is less then 2Ghz it is generated
at most every 1s. In general an NMI is generated much less often as
the performance counter counts down only if the cpu is not idle.
Therefore the overhead of this feature is fairly minimal even if the
load is high.
Uppon detecting that the kernel is locked up the kernel dumps the
state of the kernel registers and panics.
Local APIC must be enabled for the watchdog to work.
The code is _always_ compiled in, however, it is only enabled if
watchdog=<non-zero> is set in the boot monitor.
One corner case is serial console debugging. As dumping a lot of stuff
to the serial link may take a lot of time, the watchdog does not
detect lockups during this time!!! as it would result in too many
false positives. 10 nmi have to be handled before the lockup is
detected. This means something between ~5s to 10s.
Another corner case is that the watchdog is enabled only after the
paging is enabled as it would be pure madness to try to get it right.
Merge of Wu's GSOC 09 branch (src.20090525.r4372.wu)
Main changes:
- COW optimization for safecopy.
- safemap, a grant-based interface for sharing memory regions between processes.
- Integration with safemap and complete rework of DS, supporting new data types
natively (labels, memory ranges, memory mapped ranges).
- For further information:
http://wiki.minix3.org/en/SummerOfCode2009/MemoryGrants
Additional changes not included in the original Wu's branch:
- Fixed unhandled case in VM when using COW optimization for safecopy in case
of a block that has already been shared as SMAP.
- Better interface and naming scheme for sys_saferevmap and ds_retrieve_map
calls.
- Better input checking in syslib: check for page alignment when creating
memory mapping grants.
- DS notifies subscribers when an entry is deleted.
- Documented the behavior of indirect grants in case of memory mapping.
- Test suite in /usr/src/test/safeperf|safecopy|safemap|ds/* reworked
and extended.
- Minor fixes and general cleanup.
- TO-DO: Grant ids should be generated and managed the way endpoints are to make
sure grant slots are never misreused.