Главная страница


ru.linux

 
 - RU.LINUX ---------------------------------------------------------------------
 From : Sergey Lentsov                       2:4615/71.10   22 Mar 2001  18:11:10
 To : All
 Subject : URL: http://lwn.net/2001/0322/kernel.php3
 -------------------------------------------------------------------------------- 
 
    [1][LWN Logo] 
    
                                [2]Click Here 
    [LWN.net]
    
    Sections:
     [3]Main page
     [4]Security
     Kernel
     [5]Distributions
     [6]On the Desktop
     [7]Development
     [8]Commerce
     [9]Linux in the news
     [10]Announcements
     [11]Linux History
     [12]Letters
    [13]All in one big page
    
    See also: [14]last week's Kernel page.
    
 Kernel development
 
    The current kernel release is still 2.4.2. The current prepatch is
    2.4.3pre6, released early in the morning on March 21. The [15]patch
    log file is, as of this writing, only updated to 2.4.3pre5, however.
    
    No 2.2.19 prepatches have been released this week.
    
    Changing the memory map semaphore. One of the changes that is now in
    the 2.4.3 prepatch is a new memory map locking scheme implemented by
    Rik van Riel. The memory map semaphore controls access to the various
    virtual memory areas and page tables used by a process; it is intended
    to keep concurrent activities, such as page faults, memory map
    changes, and informational queries from stepping on each other. It is
    a fundamental part of how the virtual memory system works.
    
    It also, seemingly, is a performance problem. For example programs
    that use the /proc interface to get process information can find
    themselves blocked for long periods of time. Page faults, too, can be
    slowed down, even when they occur in different places and should not
    conflict with each other. Multi-threaded programs, such as the MySQL
    server or Apache 2.0, are restricted to handling just one page fault
    at a time across the whole set of threads. In some cases, this
    restriction can lead to very poor performance.
    
    Rik's change is to turn the memory map semaphore into a variant known
    as a reader-writer semaphore (or R/W semaphore). These semaphores
    allow multiple threads to access a common data structure
    simultaneously, as long as none of them make any changes. Once
    somebody needs to change things, it must wait until all of the readers
    have finished their business, then lock them out for the duration of
    the change.
    
    An R/W semaphore suits this situation well, since both the /proc and
    page fault cases do not actually need to change the memory map. With
    the change applied, the system can do more things simultaneously. Even
    on uniprocessor systems, things will work better, since work need not
    wait for the resolution of a page fault, which can involve disk
    activity.
    
    It's also a relatively fundamental and scary change for a stable
    kernel release. Even Linus, while [16]accepting the change, is a
    little nervous about it:
    
      I'm applying this to my tree - I'm not exactly comfortable with
      this during the 2.4.x timeframe, but at the same time I'm even less
      comfortable with the current alternative, which is to make the
      regular semaphores fairer (we tried it once, and the implementation
      had problems, I'm not going to try that again during 2.4.x).
      
    The patch also, [17]as of 2.4.3pre5, "has only been tested on i386
    without PAE, and is known to break other architectures." There have
    been some good reports, though, on the performance effects of this
    patch. But it may mean that the real 2.4.3 will not be out for a while
    yet, since Linus will want to give it some time to stabilize and prove
    that everything works.
    
    Global kernel analysis. Dawson Engler, at Stanford, has put together
    an extension to the gcc compiler which allows it to perform detailed,
    global analysis of a body of code and point out a number of possible
    bugs. Over the last week, he and his students have been posting the
    results of this work. They have found some impressive things,
    including:
      * Places where pointers are interpreted as user-space addresses
        (i.e. they are passed to a function like copy_to_user), but where
        the same pointer is also dereferenced directly ([18]nine cases).
        Kernel code running in process context can generally get away with
        that sort of reference, but it's risky for a few reasons. The
        user-space address may not be valid (or the page could have been
        swapped out since the kernel last checked), and there are security
        implications as well.
      * Large variables on the kernel stack ([19]22 cases, plus [20]a few
        more when devfs is used). The kernel stack is limited in size, and
        putting large variables there risks overflowing the allocated
        space.
      * Various locking bugs ([21]16 cases). These include paths that
        could take out a lock and forget to unlock it, and potential
        misuse of the processor state flags.
      * Places where kernel memory is used after it has been freed [22]14
        cases.
      * Inconsistent treatment of interrupts ([23]28 cases). Code that
        sometimes runs with interrupts enabled and other times not is
        likely to be buggy; functions which sometimes forget to reenable
        interrupts certainly are.
      * Places where a pointer returned by a function that can fail is not
        checked ([24]120 cases).
      * Calls to functions that can block while interrupts are disabled or
        spinlocks are held ([25]163 cases). Kernel code, of course, should
        not block in either case, or serious performance problems (or
        deadlocks) can result.
        
    The response from the kernel hackers has been quite positive, for one
    simple reason: quite a few new bugs have been found. Many of the
    things being tested for are the sort of subtle bug that can be very
    easy to create and hard to track down.
    
    The tool that is doing this work is called "MC" ("meta-level
    compilation"); it was created by a team headed by Mr. Engler and
    sponsored by DARPA grant MDA904-98-C-A933. MC defines an extension
    language for gcc called "metal," which can be used to program specific
    checks to be applied to the code. Here, for example, is a piece of
    code which looks for errors in enabling and disabling interrupts:
 { #include "linux-includes.h" }
 sm check_interrupts {
   // Variables used in patterns
   decl { unsigned } flags;
 
   // Patterns to specify enable/disable functions
   pat enable = { sti(); }
              | { restore_flags(flags); };
   pat disable = { cli(); };
 
   // States
   // The first state is the initial state
   is_enabled: disable ==> is_disabled
      | enable ==> { err("double enable"); };
   is_disabled: enable ==> is_enabled
      | disable ==> { err("double disable"); }
      // Special pattern that matches when the SM
      // hits the end of any path in this state
      | $end_of_path$ ==> { err("exiting w/intr disabled!"); };
 }
 
    Those who are interested in MC should check out Mr. Engler's paper
    "Checking system rules using system-specific, programmer-written
    compiler extensions," which is available on the net [26]in PostScript
    format. The code fragment above was taken from that paper. Please
    don't bug Mr. Engler about obtaining the code, however; the system is
    still under development and has not yet been generally released. In
    time, however, it should become part of the standard kernel hacker's
    toolkit.
    
    JFFS2 released. The folks at Red Hat have [27]announced the release of
    the JFFS2 filesystem. It's a complete reimplementation of Axis
    Communications' Journaling Flash Filesystem, with a number of
    improvements. It's available via CVS, and only works with the 2.4
    kernel. An iPAQ kernel with JFFS2 built in is available as well.
    
    Help out the kernel manual pages. Andries Brouwer has released
    [28]man-pages-1.35. In the announcement, he notes:
    
      David Mosberger expressed his worry that especially man page
      Section 2 is out-dated and x86 specific, with no indication that
      other architectures even exist. No doubt he is right.
      
    So the request has gone out: please point out the man pages that are
    wrong, and, if possible, supply fixes while you're at it. This is a
    good way for people to help out without having to actually hack on the
    kernel code.
    
    FSM's kernel patch. Kernel patches do not normally come with press
    releases, or, at least, they didn't. This week, FSMLabs (the RTLinux
    company) [29]announced that it had released a memory management patch.
    It seems that a memory management change in 2.4 creates some
    difficulties for RTLinux, so they went and developed a fix. And
    announced it to the world.
    
    [30]The patch itself is quite small, especially considering that the
    one real chunk of code there is lifted the MIPS version of
    <asm/pgalloc.h>. It adds a couple of big kernel lock invocations, and
    a function which propagates page directory changes across processes
    and CPUs. That's evidently enough to restore low latency on a reliable
    basis for real-time tasks.
    
    Other patches and updates released this week include:
    
      * David Miller has released [31]a new zero-copy networking patch.
        Note that this code can also be found in Alan Cox's 2.4.2-ac
        patches.
      * [32]ksymoops-2.4.1 was released by Keith Owens.
      * Andrew Morton has [33]fixed a number of problems with the secure
        attention key handling. There were some serious errors in that
        code; as he puts it, "it's pretty obvious that nobody has been
        testing SAK."
      * Eric Raymond has released [34]cml2-0.9.4, the latest version of
        his new kernel configuration scheme. It includes the performance
        improvements discussed in [35]last week's LWN.net kernel page.
      * Karim Yaghmour has [36]released version 0.9.4 of the Linux Trace
        Toolkit. It works with Linux and RTAI on the x86 and PowerPC
        architectures.
      * IBM has released [37]version 0.2.1 of its "JFS" journaled
        filesystem.
      * Andreas Gruenbacher has [38]released version 0.7.9 of the access
        control list patch. This patch fixes two problems with NFS.
      * [39]iptables 1.2.1 is out. See the [40]small fix that was posted
        shortly afterward, though, if you want to install this release.
        
    Section Editor: [41]Jonathan Corbet
    March 22, 2001
    
    For other kernel news, see:
      * [42]Kernelnotes
      * [43]Kernel traffic
      * [44]Kernel Newsflash
      * [45]Kernel Trap
    
    Other resources:
      * [46]Kernel Source Reference
      * [47]L-K mailing list FAQ
      * [48]Linux-MM
      * [49]Linux Scalability Project
      * [50]Kernel Newbies
    
    
    
                                                   [51]Next: Distributions
    
    [52]Eklektix, Inc. Linux powered! Copyright Л 2001 [53]Eklektix, Inc.,
    all rights reserved
    Linux (R) is a registered trademark of Linus Torvalds
 
 References
 
    1. http://lwn.net/
    2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
    3. http://lwn.net/2001/0322/
    4. http://lwn.net/2001/0322/security.php3
    5. http://lwn.net/2001/0322/dists.php3
    6. http://lwn.net/2001/0322/desktop.php3
    7. http://lwn.net/2001/0322/devel.php3
    8. http://lwn.net/2001/0322/commerce.php3
    9. http://lwn.net/2001/0322/press.php3
   10. http://lwn.net/2001/0322/announce.php3
   11. http://lwn.net/2001/0322/history.php3
   12. http://lwn.net/2001/0322/letters.php3
   13. http://lwn.net/2001/0322/bigpage.php3
   14. http://lwn.net/2001/0315/kernel.php3
   15. http://lwn.net/2001/0322/a/2.4.3pre5.php3
   16. http://lwn.net/2001/0322/a/lt-mmap_sem.php3
   17. http://lwn.net/2001/0322/a/lt-pre5.php3
   18. http://lwn.net/2001/0322/a/c-userspace-pointers.php3
   19. http://lwn.net/2001/0322/a/c-big-variables.php3
   20. http://lwn.net/2001/0322/a/c-devfs.php3
   21. http://lwn.net/2001/0322/a/c-locking.php3
   22. http://lwn.net/2001/0322/a/c-use-after-free.php3
   23. http://lwn.net/2001/0322/a/c-interrupt.php3
   24. http://lwn.net/2001/0322/a/c-failures.php3
   25. http://lwn.net/2001/0322/a/c-blocking.php3
   26. http://www.stanford.edu/~engler/mc-osdi.ps
   27. http://lwn.net/2001/0322/a/jffs2.php3
   28. http://lwn.net/2001/0322/a/man-pages.php3
   29. http://lwn.net/2001/0322/a/fsm-rtlinux.php3
   30. http://lwn.net/2001/0322/a/vmalloc_fix.php3
   31. http://lwn.net/2001/0322/a/zerocopy.php3
   32. http://lwn.net/2001/0322/a/ksymoops.php3
   33. http://lwn.net/2001/0322/a/SAK.php3
   34. http://lwn.net/2001/0322/a/cml.php3
   35. http://lwn.net/2001/0315/kernel.php3
   36. http://lwn.net/2001/0322/a/ltt.php3
   37. http://lwn.net/2001/0322/a/jfs.php3
   38. http://lwn.net/2001/0322/a/acl.php3
   39. http://lwn.net/2001/0322/a/iptables.php3
   40. http://lwn.net/2001/0322/a/iptables-fix.php3
   41. mailto:lwn@lwn.net
   42. http://www.kernelnotes.org/
   43. http://kt.zork.net/
   44. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
   45. http://www.kerneltrap.com/
   46. http://lksr.org/
   47. http://www.tux.org/lkml/
   48. http://www.linux.eu.org/Linux-MM/
   49. http://www.citi.umich.edu/projects/linux-scalability/
   50. http://www.kernelnewbies.org/
   51. http://lwn.net/2001/0322/dists.php3
   52. http://www.eklektix.com/
   53. http://www.eklektix.com/
 
 --- ifmail v.2.14.os7-aks1
  * Origin: Unknown (2:4615/71.10@fidonet)
 
 

Вернуться к списку тем, сортированных по: возрастание даты  уменьшение даты  тема  автор 

 Тема:    Автор:    Дата:  
 URL: http://lwn.net/2001/0322/kernel.php3   Sergey Lentsov   22 Mar 2001 18:11:10 
Архивное /ru.linux/20308aaacc6a0.html, оценка 2 из 5, голосов 10
Яндекс.Метрика
Valid HTML 4.01 Transitional