|
|
ru.linux- RU.LINUX --------------------------------------------------------------------- From : Sergey Lentsov 2:4615/71.10 22 Mar 2001 18:11:10 To : All Subject : URL: http://lwn.net/2001/0322/kernel.php3 --------------------------------------------------------------------------------
[1][LWN Logo]
[2]Click Here
[LWN.net]
Sections:
[3]Main page
[4]Security
Kernel
[5]Distributions
[6]On the Desktop
[7]Development
[8]Commerce
[9]Linux in the news
[10]Announcements
[11]Linux History
[12]Letters
[13]All in one big page
See also: [14]last week's Kernel page.
Kernel development
The current kernel release is still 2.4.2. The current prepatch is
2.4.3pre6, released early in the morning on March 21. The [15]patch
log file is, as of this writing, only updated to 2.4.3pre5, however.
No 2.2.19 prepatches have been released this week.
Changing the memory map semaphore. One of the changes that is now in
the 2.4.3 prepatch is a new memory map locking scheme implemented by
Rik van Riel. The memory map semaphore controls access to the various
virtual memory areas and page tables used by a process; it is intended
to keep concurrent activities, such as page faults, memory map
changes, and informational queries from stepping on each other. It is
a fundamental part of how the virtual memory system works.
It also, seemingly, is a performance problem. For example programs
that use the /proc interface to get process information can find
themselves blocked for long periods of time. Page faults, too, can be
slowed down, even when they occur in different places and should not
conflict with each other. Multi-threaded programs, such as the MySQL
server or Apache 2.0, are restricted to handling just one page fault
at a time across the whole set of threads. In some cases, this
restriction can lead to very poor performance.
Rik's change is to turn the memory map semaphore into a variant known
as a reader-writer semaphore (or R/W semaphore). These semaphores
allow multiple threads to access a common data structure
simultaneously, as long as none of them make any changes. Once
somebody needs to change things, it must wait until all of the readers
have finished their business, then lock them out for the duration of
the change.
An R/W semaphore suits this situation well, since both the /proc and
page fault cases do not actually need to change the memory map. With
the change applied, the system can do more things simultaneously. Even
on uniprocessor systems, things will work better, since work need not
wait for the resolution of a page fault, which can involve disk
activity.
It's also a relatively fundamental and scary change for a stable
kernel release. Even Linus, while [16]accepting the change, is a
little nervous about it:
I'm applying this to my tree - I'm not exactly comfortable with
this during the 2.4.x timeframe, but at the same time I'm even less
comfortable with the current alternative, which is to make the
regular semaphores fairer (we tried it once, and the implementation
had problems, I'm not going to try that again during 2.4.x).
The patch also, [17]as of 2.4.3pre5, "has only been tested on i386
without PAE, and is known to break other architectures." There have
been some good reports, though, on the performance effects of this
patch. But it may mean that the real 2.4.3 will not be out for a while
yet, since Linus will want to give it some time to stabilize and prove
that everything works.
Global kernel analysis. Dawson Engler, at Stanford, has put together
an extension to the gcc compiler which allows it to perform detailed,
global analysis of a body of code and point out a number of possible
bugs. Over the last week, he and his students have been posting the
results of this work. They have found some impressive things,
including:
* Places where pointers are interpreted as user-space addresses
(i.e. they are passed to a function like copy_to_user), but where
the same pointer is also dereferenced directly ([18]nine cases).
Kernel code running in process context can generally get away with
that sort of reference, but it's risky for a few reasons. The
user-space address may not be valid (or the page could have been
swapped out since the kernel last checked), and there are security
implications as well.
* Large variables on the kernel stack ([19]22 cases, plus [20]a few
more when devfs is used). The kernel stack is limited in size, and
putting large variables there risks overflowing the allocated
space.
* Various locking bugs ([21]16 cases). These include paths that
could take out a lock and forget to unlock it, and potential
misuse of the processor state flags.
* Places where kernel memory is used after it has been freed [22]14
cases.
* Inconsistent treatment of interrupts ([23]28 cases). Code that
sometimes runs with interrupts enabled and other times not is
likely to be buggy; functions which sometimes forget to reenable
interrupts certainly are.
* Places where a pointer returned by a function that can fail is not
checked ([24]120 cases).
* Calls to functions that can block while interrupts are disabled or
spinlocks are held ([25]163 cases). Kernel code, of course, should
not block in either case, or serious performance problems (or
deadlocks) can result.
The response from the kernel hackers has been quite positive, for one
simple reason: quite a few new bugs have been found. Many of the
things being tested for are the sort of subtle bug that can be very
easy to create and hard to track down.
The tool that is doing this work is called "MC" ("meta-level
compilation"); it was created by a team headed by Mr. Engler and
sponsored by DARPA grant MDA904-98-C-A933. MC defines an extension
language for gcc called "metal," which can be used to program specific
checks to be applied to the code. Here, for example, is a piece of
code which looks for errors in enabling and disabling interrupts:
{ #include "linux-includes.h" }
sm check_interrupts {
// Variables used in patterns
decl { unsigned } flags;
// Patterns to specify enable/disable functions
pat enable = { sti(); }
| { restore_flags(flags); };
pat disable = { cli(); };
// States
// The first state is the initial state
is_enabled: disable ==> is_disabled
| enable ==> { err("double enable"); };
is_disabled: enable ==> is_enabled
| disable ==> { err("double disable"); }
// Special pattern that matches when the SM
// hits the end of any path in this state
| $end_of_path$ ==> { err("exiting w/intr disabled!"); };
}
Those who are interested in MC should check out Mr. Engler's paper
"Checking system rules using system-specific, programmer-written
compiler extensions," which is available on the net [26]in PostScript
format. The code fragment above was taken from that paper. Please
don't bug Mr. Engler about obtaining the code, however; the system is
still under development and has not yet been generally released. In
time, however, it should become part of the standard kernel hacker's
toolkit.
JFFS2 released. The folks at Red Hat have [27]announced the release of
the JFFS2 filesystem. It's a complete reimplementation of Axis
Communications' Journaling Flash Filesystem, with a number of
improvements. It's available via CVS, and only works with the 2.4
kernel. An iPAQ kernel with JFFS2 built in is available as well.
Help out the kernel manual pages. Andries Brouwer has released
[28]man-pages-1.35. In the announcement, he notes:
David Mosberger expressed his worry that especially man page
Section 2 is out-dated and x86 specific, with no indication that
other architectures even exist. No doubt he is right.
So the request has gone out: please point out the man pages that are
wrong, and, if possible, supply fixes while you're at it. This is a
good way for people to help out without having to actually hack on the
kernel code.
FSM's kernel patch. Kernel patches do not normally come with press
releases, or, at least, they didn't. This week, FSMLabs (the RTLinux
company) [29]announced that it had released a memory management patch.
It seems that a memory management change in 2.4 creates some
difficulties for RTLinux, so they went and developed a fix. And
announced it to the world.
[30]The patch itself is quite small, especially considering that the
one real chunk of code there is lifted the MIPS version of
<asm/pgalloc.h>. It adds a couple of big kernel lock invocations, and
a function which propagates page directory changes across processes
and CPUs. That's evidently enough to restore low latency on a reliable
basis for real-time tasks.
Other patches and updates released this week include:
* David Miller has released [31]a new zero-copy networking patch.
Note that this code can also be found in Alan Cox's 2.4.2-ac
patches.
* [32]ksymoops-2.4.1 was released by Keith Owens.
* Andrew Morton has [33]fixed a number of problems with the secure
attention key handling. There were some serious errors in that
code; as he puts it, "it's pretty obvious that nobody has been
testing SAK."
* Eric Raymond has released [34]cml2-0.9.4, the latest version of
his new kernel configuration scheme. It includes the performance
improvements discussed in [35]last week's LWN.net kernel page.
* Karim Yaghmour has [36]released version 0.9.4 of the Linux Trace
Toolkit. It works with Linux and RTAI on the x86 and PowerPC
architectures.
* IBM has released [37]version 0.2.1 of its "JFS" journaled
filesystem.
* Andreas Gruenbacher has [38]released version 0.7.9 of the access
control list patch. This patch fixes two problems with NFS.
* [39]iptables 1.2.1 is out. See the [40]small fix that was posted
shortly afterward, though, if you want to install this release.
Section Editor: [41]Jonathan Corbet
March 22, 2001
For other kernel news, see:
* [42]Kernelnotes
* [43]Kernel traffic
* [44]Kernel Newsflash
* [45]Kernel Trap
Other resources:
* [46]Kernel Source Reference
* [47]L-K mailing list FAQ
* [48]Linux-MM
* [49]Linux Scalability Project
* [50]Kernel Newbies
[51]Next: Distributions
[52]Eklektix, Inc. Linux powered! Copyright Л 2001 [53]Eklektix, Inc.,
all rights reserved
Linux (R) is a registered trademark of Linus Torvalds
References
1. http://lwn.net/
2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
3. http://lwn.net/2001/0322/
4. http://lwn.net/2001/0322/security.php3
5. http://lwn.net/2001/0322/dists.php3
6. http://lwn.net/2001/0322/desktop.php3
7. http://lwn.net/2001/0322/devel.php3
8. http://lwn.net/2001/0322/commerce.php3
9. http://lwn.net/2001/0322/press.php3
10. http://lwn.net/2001/0322/announce.php3
11. http://lwn.net/2001/0322/history.php3
12. http://lwn.net/2001/0322/letters.php3
13. http://lwn.net/2001/0322/bigpage.php3
14. http://lwn.net/2001/0315/kernel.php3
15. http://lwn.net/2001/0322/a/2.4.3pre5.php3
16. http://lwn.net/2001/0322/a/lt-mmap_sem.php3
17. http://lwn.net/2001/0322/a/lt-pre5.php3
18. http://lwn.net/2001/0322/a/c-userspace-pointers.php3
19. http://lwn.net/2001/0322/a/c-big-variables.php3
20. http://lwn.net/2001/0322/a/c-devfs.php3
21. http://lwn.net/2001/0322/a/c-locking.php3
22. http://lwn.net/2001/0322/a/c-use-after-free.php3
23. http://lwn.net/2001/0322/a/c-interrupt.php3
24. http://lwn.net/2001/0322/a/c-failures.php3
25. http://lwn.net/2001/0322/a/c-blocking.php3
26. http://www.stanford.edu/~engler/mc-osdi.ps
27. http://lwn.net/2001/0322/a/jffs2.php3
28. http://lwn.net/2001/0322/a/man-pages.php3
29. http://lwn.net/2001/0322/a/fsm-rtlinux.php3
30. http://lwn.net/2001/0322/a/vmalloc_fix.php3
31. http://lwn.net/2001/0322/a/zerocopy.php3
32. http://lwn.net/2001/0322/a/ksymoops.php3
33. http://lwn.net/2001/0322/a/SAK.php3
34. http://lwn.net/2001/0322/a/cml.php3
35. http://lwn.net/2001/0315/kernel.php3
36. http://lwn.net/2001/0322/a/ltt.php3
37. http://lwn.net/2001/0322/a/jfs.php3
38. http://lwn.net/2001/0322/a/acl.php3
39. http://lwn.net/2001/0322/a/iptables.php3
40. http://lwn.net/2001/0322/a/iptables-fix.php3
41. mailto:lwn@lwn.net
42. http://www.kernelnotes.org/
43. http://kt.zork.net/
44. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
45. http://www.kerneltrap.com/
46. http://lksr.org/
47. http://www.tux.org/lkml/
48. http://www.linux.eu.org/Linux-MM/
49. http://www.citi.umich.edu/projects/linux-scalability/
50. http://www.kernelnewbies.org/
51. http://lwn.net/2001/0322/dists.php3
52. http://www.eklektix.com/
53. http://www.eklektix.com/
--- ifmail v.2.14.os7-aks1
* Origin: Unknown (2:4615/71.10@fidonet)
Вернуться к списку тем, сортированных по: возрастание даты уменьшение даты тема автор
Архивное /ru.linux/20308aaacc6a0.html, оценка из 5, голосов 10
|