|
ru.linux- RU.LINUX --------------------------------------------------------------------- From : Sergey Lentsov 2:4615/71.10 22 Mar 2001 18:11:10 To : All Subject : URL: http://lwn.net/2001/0322/kernel.php3 -------------------------------------------------------------------------------- [1][LWN Logo] [2]Click Here [LWN.net] Sections: [3]Main page [4]Security Kernel [5]Distributions [6]On the Desktop [7]Development [8]Commerce [9]Linux in the news [10]Announcements [11]Linux History [12]Letters [13]All in one big page See also: [14]last week's Kernel page. Kernel development The current kernel release is still 2.4.2. The current prepatch is 2.4.3pre6, released early in the morning on March 21. The [15]patch log file is, as of this writing, only updated to 2.4.3pre5, however. No 2.2.19 prepatches have been released this week. Changing the memory map semaphore. One of the changes that is now in the 2.4.3 prepatch is a new memory map locking scheme implemented by Rik van Riel. The memory map semaphore controls access to the various virtual memory areas and page tables used by a process; it is intended to keep concurrent activities, such as page faults, memory map changes, and informational queries from stepping on each other. It is a fundamental part of how the virtual memory system works. It also, seemingly, is a performance problem. For example programs that use the /proc interface to get process information can find themselves blocked for long periods of time. Page faults, too, can be slowed down, even when they occur in different places and should not conflict with each other. Multi-threaded programs, such as the MySQL server or Apache 2.0, are restricted to handling just one page fault at a time across the whole set of threads. In some cases, this restriction can lead to very poor performance. Rik's change is to turn the memory map semaphore into a variant known as a reader-writer semaphore (or R/W semaphore). These semaphores allow multiple threads to access a common data structure simultaneously, as long as none of them make any changes. Once somebody needs to change things, it must wait until all of the readers have finished their business, then lock them out for the duration of the change. An R/W semaphore suits this situation well, since both the /proc and page fault cases do not actually need to change the memory map. With the change applied, the system can do more things simultaneously. Even on uniprocessor systems, things will work better, since work need not wait for the resolution of a page fault, which can involve disk activity. It's also a relatively fundamental and scary change for a stable kernel release. Even Linus, while [16]accepting the change, is a little nervous about it: I'm applying this to my tree - I'm not exactly comfortable with this during the 2.4.x timeframe, but at the same time I'm even less comfortable with the current alternative, which is to make the regular semaphores fairer (we tried it once, and the implementation had problems, I'm not going to try that again during 2.4.x). The patch also, [17]as of 2.4.3pre5, "has only been tested on i386 without PAE, and is known to break other architectures." There have been some good reports, though, on the performance effects of this patch. But it may mean that the real 2.4.3 will not be out for a while yet, since Linus will want to give it some time to stabilize and prove that everything works. Global kernel analysis. Dawson Engler, at Stanford, has put together an extension to the gcc compiler which allows it to perform detailed, global analysis of a body of code and point out a number of possible bugs. Over the last week, he and his students have been posting the results of this work. They have found some impressive things, including: * Places where pointers are interpreted as user-space addresses (i.e. they are passed to a function like copy_to_user), but where the same pointer is also dereferenced directly ([18]nine cases). Kernel code running in process context can generally get away with that sort of reference, but it's risky for a few reasons. The user-space address may not be valid (or the page could have been swapped out since the kernel last checked), and there are security implications as well. * Large variables on the kernel stack ([19]22 cases, plus [20]a few more when devfs is used). The kernel stack is limited in size, and putting large variables there risks overflowing the allocated space. * Various locking bugs ([21]16 cases). These include paths that could take out a lock and forget to unlock it, and potential misuse of the processor state flags. * Places where kernel memory is used after it has been freed [22]14 cases. * Inconsistent treatment of interrupts ([23]28 cases). Code that sometimes runs with interrupts enabled and other times not is likely to be buggy; functions which sometimes forget to reenable interrupts certainly are. * Places where a pointer returned by a function that can fail is not checked ([24]120 cases). * Calls to functions that can block while interrupts are disabled or spinlocks are held ([25]163 cases). Kernel code, of course, should not block in either case, or serious performance problems (or deadlocks) can result. The response from the kernel hackers has been quite positive, for one simple reason: quite a few new bugs have been found. Many of the things being tested for are the sort of subtle bug that can be very easy to create and hard to track down. The tool that is doing this work is called "MC" ("meta-level compilation"); it was created by a team headed by Mr. Engler and sponsored by DARPA grant MDA904-98-C-A933. MC defines an extension language for gcc called "metal," which can be used to program specific checks to be applied to the code. Here, for example, is a piece of code which looks for errors in enabling and disabling interrupts: { #include "linux-includes.h" } sm check_interrupts { // Variables used in patterns decl { unsigned } flags; // Patterns to specify enable/disable functions pat enable = { sti(); } | { restore_flags(flags); }; pat disable = { cli(); }; // States // The first state is the initial state is_enabled: disable ==> is_disabled | enable ==> { err("double enable"); }; is_disabled: enable ==> is_enabled | disable ==> { err("double disable"); } // Special pattern that matches when the SM // hits the end of any path in this state | $end_of_path$ ==> { err("exiting w/intr disabled!"); }; } Those who are interested in MC should check out Mr. Engler's paper "Checking system rules using system-specific, programmer-written compiler extensions," which is available on the net [26]in PostScript format. The code fragment above was taken from that paper. Please don't bug Mr. Engler about obtaining the code, however; the system is still under development and has not yet been generally released. In time, however, it should become part of the standard kernel hacker's toolkit. JFFS2 released. The folks at Red Hat have [27]announced the release of the JFFS2 filesystem. It's a complete reimplementation of Axis Communications' Journaling Flash Filesystem, with a number of improvements. It's available via CVS, and only works with the 2.4 kernel. An iPAQ kernel with JFFS2 built in is available as well. Help out the kernel manual pages. Andries Brouwer has released [28]man-pages-1.35. In the announcement, he notes: David Mosberger expressed his worry that especially man page Section 2 is out-dated and x86 specific, with no indication that other architectures even exist. No doubt he is right. So the request has gone out: please point out the man pages that are wrong, and, if possible, supply fixes while you're at it. This is a good way for people to help out without having to actually hack on the kernel code. FSM's kernel patch. Kernel patches do not normally come with press releases, or, at least, they didn't. This week, FSMLabs (the RTLinux company) [29]announced that it had released a memory management patch. It seems that a memory management change in 2.4 creates some difficulties for RTLinux, so they went and developed a fix. And announced it to the world. [30]The patch itself is quite small, especially considering that the one real chunk of code there is lifted the MIPS version of <asm/pgalloc.h>. It adds a couple of big kernel lock invocations, and a function which propagates page directory changes across processes and CPUs. That's evidently enough to restore low latency on a reliable basis for real-time tasks. Other patches and updates released this week include: * David Miller has released [31]a new zero-copy networking patch. Note that this code can also be found in Alan Cox's 2.4.2-ac patches. * [32]ksymoops-2.4.1 was released by Keith Owens. * Andrew Morton has [33]fixed a number of problems with the secure attention key handling. There were some serious errors in that code; as he puts it, "it's pretty obvious that nobody has been testing SAK." * Eric Raymond has released [34]cml2-0.9.4, the latest version of his new kernel configuration scheme. It includes the performance improvements discussed in [35]last week's LWN.net kernel page. * Karim Yaghmour has [36]released version 0.9.4 of the Linux Trace Toolkit. It works with Linux and RTAI on the x86 and PowerPC architectures. * IBM has released [37]version 0.2.1 of its "JFS" journaled filesystem. * Andreas Gruenbacher has [38]released version 0.7.9 of the access control list patch. This patch fixes two problems with NFS. * [39]iptables 1.2.1 is out. See the [40]small fix that was posted shortly afterward, though, if you want to install this release. Section Editor: [41]Jonathan Corbet March 22, 2001 For other kernel news, see: * [42]Kernelnotes * [43]Kernel traffic * [44]Kernel Newsflash * [45]Kernel Trap Other resources: * [46]Kernel Source Reference * [47]L-K mailing list FAQ * [48]Linux-MM * [49]Linux Scalability Project * [50]Kernel Newbies [51]Next: Distributions [52]Eklektix, Inc. Linux powered! Copyright Л 2001 [53]Eklektix, Inc., all rights reserved Linux (R) is a registered trademark of Linus Torvalds References 1. http://lwn.net/ 2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012 3. http://lwn.net/2001/0322/ 4. http://lwn.net/2001/0322/security.php3 5. http://lwn.net/2001/0322/dists.php3 6. http://lwn.net/2001/0322/desktop.php3 7. http://lwn.net/2001/0322/devel.php3 8. http://lwn.net/2001/0322/commerce.php3 9. http://lwn.net/2001/0322/press.php3 10. http://lwn.net/2001/0322/announce.php3 11. http://lwn.net/2001/0322/history.php3 12. http://lwn.net/2001/0322/letters.php3 13. http://lwn.net/2001/0322/bigpage.php3 14. http://lwn.net/2001/0315/kernel.php3 15. http://lwn.net/2001/0322/a/2.4.3pre5.php3 16. http://lwn.net/2001/0322/a/lt-mmap_sem.php3 17. http://lwn.net/2001/0322/a/lt-pre5.php3 18. http://lwn.net/2001/0322/a/c-userspace-pointers.php3 19. http://lwn.net/2001/0322/a/c-big-variables.php3 20. http://lwn.net/2001/0322/a/c-devfs.php3 21. http://lwn.net/2001/0322/a/c-locking.php3 22. http://lwn.net/2001/0322/a/c-use-after-free.php3 23. http://lwn.net/2001/0322/a/c-interrupt.php3 24. http://lwn.net/2001/0322/a/c-failures.php3 25. http://lwn.net/2001/0322/a/c-blocking.php3 26. http://www.stanford.edu/~engler/mc-osdi.ps 27. http://lwn.net/2001/0322/a/jffs2.php3 28. http://lwn.net/2001/0322/a/man-pages.php3 29. http://lwn.net/2001/0322/a/fsm-rtlinux.php3 30. http://lwn.net/2001/0322/a/vmalloc_fix.php3 31. http://lwn.net/2001/0322/a/zerocopy.php3 32. http://lwn.net/2001/0322/a/ksymoops.php3 33. http://lwn.net/2001/0322/a/SAK.php3 34. http://lwn.net/2001/0322/a/cml.php3 35. http://lwn.net/2001/0315/kernel.php3 36. http://lwn.net/2001/0322/a/ltt.php3 37. http://lwn.net/2001/0322/a/jfs.php3 38. http://lwn.net/2001/0322/a/acl.php3 39. http://lwn.net/2001/0322/a/iptables.php3 40. http://lwn.net/2001/0322/a/iptables-fix.php3 41. mailto:lwn@lwn.net 42. http://www.kernelnotes.org/ 43. http://kt.zork.net/ 44. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html 45. http://www.kerneltrap.com/ 46. http://lksr.org/ 47. http://www.tux.org/lkml/ 48. http://www.linux.eu.org/Linux-MM/ 49. http://www.citi.umich.edu/projects/linux-scalability/ 50. http://www.kernelnewbies.org/ 51. http://lwn.net/2001/0322/dists.php3 52. http://www.eklektix.com/ 53. http://www.eklektix.com/ --- ifmail v.2.14.os7-aks1 * Origin: Unknown (2:4615/71.10@fidonet) Вернуться к списку тем, сортированных по: возрастание даты уменьшение даты тема автор
Архивное /ru.linux/20308aaacc6a0.html, оценка из 5, голосов 10
|