Главная страница


ru.linux

 
 - RU.LINUX ---------------------------------------------------------------------
 From : Sergey Lentsov                       2:4615/71.10   02 Feb 2001  04:07:03
 To : All
 Subject : URL: http://lwn.net/2001/0201/kernel.php3
 -------------------------------------------------------------------------------- 
 
    [1][LWN Logo] 
    
                                [2]Click Here 
    [LWN.net]
    
    Sections:
     [3]Main page
     [4]Security
     Kernel
     [5]Distributions
     [6]Development
     [7]Commerce
     [8]Linux in the news
     [9]Announcements
     [10]Linux History
     [11]Letters
    [12]All in one big page
    
    See also: [13]last week's Kernel page.
    
 Kernel development
 
    The current kernel release is 2.4.1; it was [14]released on
    January 29. It contains a fair number of fixes for problems that came
    up with 2.4.0 and, of course, ReiserFS.
    
    Alan Cox's latest, meanwhile, is [15]2.4.0-ac12. It has almost
    everything that's in 2.4.1 along with a vast number of other fixes,
    many of which have been sent in by a squad of kernel "janitors" who
    are going through looking for things to clean up.
    
    More fun with ECN. ECN (Explicit Congestion Notification) is [16]an
    experimental IETF standard for TCP traffic. By making use of a couple
    of "reserved" bits in the TCP header, ECN allows routers to signal the
    presence of congestion on a network path; the systems sending data can
    then throttle back their output somewhat and avoid dropped packets. It
    can be a significant improvement for wide-area network communications.
    
    The Linux networking stack in 2.4 supports ECN (thanks to the efforts
    of Jamal Hadi Salim), and will use it if told to do so. Unfortunately,
    not all systems on the net react well to ECN; in particular, a set of
    Cisco firewall products will refuse connections with the ECN bits set
    (Cisco has a patch available, but many sites have not applied it). The
    end result is that, if you use ECN, a significant part of the network
    will be unreachable. Thus, most people using 2.4 have to disable ECN,
    either by configuring it out of the kernel completly or by disabling
    it at run time:
     echo 0 > /proc/sys/net/ipv4/tcp_ecn
 
    All the above has been known for some time, but the discussion got a
    fresh start this week when it was pointed out that [17]Hotmail is one
    of the sites that is unreachable when ECN is used. Some interesting
    questions came up as part of that discussion.
    
    The first was, simply, "why bother with ECN, since it breaks so much
    of the net?" The answer, of course, is that ECN will, eventually, make
    the net work better. In the mean time, people have to start
    implementing and deploying it. As the net becomes more ECN-compliant,
    the networks that still do not work with ECN will feel an increasing
    pressure to fix the problem.
    
    Next question: wouldn't it be possible to automatically retry failed
    connections without ECN? There are two issues with that approach. The
    first is that the systems in question reject the connection with a TCP
    reset (RST) packet. To ignore that RST and retry the connection would
    violate the TCP protocol and risk creating no end of problems. The
    other is again one of pressuring sites to fix their software; if the
    net silently works around their breakage, they'll never feel the need
    to upgrade.
    
    Of course, not everybody agrees with the need to pressure people to
    upgrade. There are two camps on the question of whether the firewalls
    in question are really broken. One side, championed by networking
    hacker David Miller, says that "reserved" means that the bits in the
    header will be used for something cool at some point. When that use
    happens, older software shouldn't break. Others, however, believe that
    a firewall should reject packets that contains bits it doesn't
    recognize. Those bits could well indicate a new feature that subverts
    the firewall's security scheme.
    
    The fact that the ECN standard is still considered "experimental" also
    gives some ammunition to those who say the non ECN-compliant systems
    should be accommodated.
    
    David Miller feels strongly about the issue, however, and has
    [18]stated his intent to put an ECN kernel on vger.kernel.org "in four
    weeks time." At that point, anybody who is behind a firewall that does
    not speak ECN will loose access to all of the mailing lists served by
    that host. Note that ECN is not required on any particular system -
    all that is necessary is that the firewall not reject packets trying
    to use ECN. For those who are concerned about the issue, David also
    posted [19]a way to test your network to see if it works properly with
    ECN.
    
    Linux has reached a point where its weight can be used to push things
    like network standards. One can only hope that this influence will be
    used wisely.
    
    A wealth of filesystems. Not that long ago, ext2 was the Linux
    filesystem. It's unlikely to give up its dominant position anytime
    soon, but ext2 is increasingly having to share the stage with other
    filesystems that have native Linux ports. ReiserFS, of course, is now
    a standard part of the kernel. This week also saw news of three other
    filesystems for Linux; they may not be quite as production-ready as
    ReiserFS, but they are getting there.
      * SGI has [20]announced the availability of the "prerelease 0.9"
        version of its XFS journaling filesystem for the 2.4.0 kernel. It
        is available as a patch, or in RPM format for easy installation.
        This release is "stable in a majority of normal environments"
        according to SGI. Features claimed by XFS include, beyond
        journaling, very high performance (high data transfer speeds and
        quick directory lookups) and scalability. The file size limit on
        XFS is about 9 million terabytes, which should be sufficient to
        handle most peoples' needs for a while yet.
      * IBM has [21]announced "drop 24 release 0.1.4" of its Journaled
        File System (JFS). JFS was designed around journaling since the
        beginning; it uses B-trees for large directories but makes no
        particular performance claims.
      * Mountain View Data (the company formed by Peter Braam along with
        Cliff and Iris Miller) [22]will be demonstrating SnapFS at
        LinuxWorld. SnapFS is actually an add-on layer for an existing
        journaling filesystem which allows the taking of "snapshots" of
        the filesystem's state.
        
    Soon the hardest thing about installing Linux may choosing which
    filesystem to use.
    
    Avoiding bad sleeps. Conectiva's Arnaldo Carvalho de Melo recently
    announced his [23]Kernel Janitor's TODO list; it's meant to be a
    clearinghouse for people who are going through the code cleaning
    things up. Going through code to be sure it returns error codes
    properly would seem to be far less attractive than, say, writing
    another filesystem for Linux. There are quite a few people interested
    in doing janitorial tasks at the moment, however, and that work
    results in a more stable kernel.
    
    As part of that effort, it was suggested that the janitors look for
    and fix all code that calls sleep_on() (and, more commonly,
    interruptible_sleep_on(), but sleep_on() is easier for kernel page
    editors to type) since (1) almost all such code is incorrect, and
    (2) Linus has agreed that those functions should be removed in the 2.5
    development series. It quickly became clear that quite a few people,
    even those familiar with kernel code, didn't understand what the
    problem with sleep_on() was. So, for the curious, here's a description
    of an obscure bug that lives within a lot of kernel code.
    
    The purpose of sleep_on() is to suspend the current process until
    something of interest happens. That something could be a read from a
    disk, the arrival of data from the network, the availability of a
    kernel data structure, the expiration of a timer, or many other
    things. Running "ps aux" will show a lot of processes with "S" in the
    "STAT" field; they are all sleeping in this manner.
    
    The problem with sleep_on() is that there is necessarily a delay
    between the decision to sleep and actually sleeping. Code that sleeps
    usually looks something like:
     while (something_is_missing) {
         take_steps_to_make_it_available ();
         sleep_on (proper_wait_queue);
     }
 
    If the thing that is being slept on happens between the test in the
    while loop and the process actually going into a sleeping state within
    sleep_on(), the wakeup event will be lost and the process could sleep
    for a very long time. In the days of the 2.0 kernel and before, this
    problem did not arise often; nowadays, instead, with SMP systems and
    fine-grained locking, this kind of race condition is much more likely
    to come about. It's still a rare occurrence (the window is quite
    small, usually), but, within operating system kernels,
    one-in-a-million events are regular occurrences.
    
    The proper way to handle this situation involves, essentially, going
    to sleep and getting on the wait queue prior to testing for the needed
    condition. Essentially, the process "sleepwalks" while testing to see
    if it really needs to wait. If the wakeup happens before the process
    gives up the processor, the process just gets put back into the
    running state and everything works as it should. The actual coding to
    sleep in this way is rather more complex than a simple sleep_in()
    call; see [24]this posting from David Woodhouse for an example of how
    it should be done. Alternatively, programmers can use the (relatively)
    new wait_event macro, which hides a lot of the details. Or one can set
    up a timeout to happen in a short while to wake up the process if
    nothing else does.
    
    A quick grep through the 2.4.1 kernel source shows well over 400 calls
    to sleep_on() and interruptible_sleep_on(). The kernel janitors have
    quite a bit of cleaning up to do.
    
    Other patches and updates released this week include:
    
      * Robert H. de Vries released [25]a new version of his POSIX timers
        patch for 2.4.
      * David Howells has posted [26]a patch implementing a facilty he
        calls "task ornaments." It's a mechanism for attaching arbitrary
        information to the task structure for code that needs to keep
        specific, task-specific information.
      * Compaq has released [27]a driver for its PCI hot plug controller
      * Dennis Koslowski has posted [28]a portscan detector module for
        netfilter.
      * [29]A Dolphin PCI Scalable Coherent Interface driver for the 2.4
        kernel was posted by Jeff Merkey.
      * Werner Almesberger has posted [30]an RFC for a new traffic control
        configuration language. The current way of configuring network
        traffic control can be a little difficult at times...
      * David Miller continues to update his [31]experimental zero-copy
        networking patch.
      * Jeff Dike has posted [32]a new version of his user-mode Linux port
        which works with the 2.4.1 kernel.
      * Rik van Riel has set up [33]a Linux memory management bug tracking
        system to help him keep on top of what's happening with the VM
        subsystem. Shortly thereafter he [34]changed the URL to use his
        newly-registered linux-mm.org domain.
      * Harald Welte has set up [35]a new mailing list to discuss a
        proposed netfilter failover implementation.
      * Daniel Phillips posted [36]a proposal for a 'cleaner, whiter'
        timer interface designed to make life easier and avoid race
        conditions.
        
    Section Editor: [37]Jonathan Corbet
    February 1, 2001
    
    For other kernel news, see:
      * [38]Kernelnotes
      * [39]Kernel traffic
      * [40]Kernel Newsflash
      * [41]Kernel Trap
    
    Other resources:
      * [42]Kernel Source Reference
      * [43]L-K mailing list FAQ
      * [44]Linux-MM
      * [45]Linux Scalability Project
    
    
    
                                                   [46]Next: Distributions
    
    [47]Eklektix, Inc. Linux powered! Copyright Щ 2001 [48]Eklektix, Inc.,
    all rights reserved
    Linux Ю is a registered trademark of Linus Torvalds
 
 References
 
    1. http://lwn.net/
    2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
    3. http://lwn.net/2001/0201/
    4. http://lwn.net/2001/0201/security.php3
    5. http://lwn.net/2001/0201/dists.php3
    6. http://lwn.net/2001/0201/devel.php3
    7. http://lwn.net/2001/0201/commerce.php3
    8. http://lwn.net/2001/0201/press.php3
    9. http://lwn.net/2001/0201/announce.php3
   10. http://lwn.net/2001/0201/history.php3
   11. http://lwn.net/2001/0201/letters.php3
   12. http://lwn.net/2001/0201/bigpage.php3
   13. http://lwn.net/2001/0125/kernel.php3
   14. http://lwn.net/2001/0201/a/2.4.1.php3
   15. http://lwn.net/2001/0201/a/2.4.0-ac12.php3
   16. http://www.landfield.com/rfcs/rfc2481.html
   17. http://lwn.net/2001/0201/a/hotmail-ecn.php3
   18. http://lwn.net/2001/0201/a/dm-ecn.php3
   19. http://lwn.net/2001/0201/a/ecn-test.php3
   20. http://lwn.net/2001/0201/a/sgi-xfs.php3
   21. http://lwn.net/2001/0201/a/jfs.php3
   22. http://lwn.net/2001/0201/a/snapfs.php3
   23. http://bazar.conectiva.com.br/~acme/TODO
   24. http://lwn.net/2001/0201/a/sleep-fix.php3
   25. http://lwn.net/2001/0201/a/posix-timers.php3
   26. http://lwn.net/2001/0201/a/task-ornaments.php3
   27. http://lwn.net/2001/0201/a/compaq-hotplug.php3
   28. http://lwn.net/2001/0201/a/netfilter-portscan.php3
   29. http://lwn.net/2001/0201/a/sci.php3
   30. http://lwn.net/2001/0201/a/tcng.php3
   31. http://lwn.net/2001/0201/a/zerocopy.php3
   32. http://lwn.net/2001/0201/a/user-mode.php3
   33. http://lwn.net/2001/0201/a/mm-bugzilla.php3
   34. http://lwn.net/2001/0201/a/bugzilla-mm-move.php3
   35. http://lwn.net/2001/0201/a/netfilter-failover.php3
   36. http://lwn.net/2001/0201/a/cleaner-whiter.php3
   37. mailto:lwn@lwn.net
   38. http://www.kernelnotes.org/
   39. http://kt.linuxcare.com/
   40. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
   41. http://www.kerneltrap.com/
   42. http://lksr.org/
   43. http://www.tux.org/lkml/
   44. http://www.linux.eu.org/Linux-MM/
   45. http://www.citi.umich.edu/projects/linux-scalability/
   46. http://lwn.net/2001/0201/dists.php3
   47. http://www.eklektix.com/
   48. http://www.eklektix.com/
 --- ifmail v.2.14.os7-aks1
  * Origin: Unknown (2:4615/71.10@fidonet)
 
 

Вернуться к списку тем, сортированных по: возрастание даты  уменьшение даты  тема  автор 

 Тема:    Автор:    Дата:  
 URL: http://lwn.net/2001/0201/kernel.php3   Sergey Lentsov   02 Feb 2001 04:07:03 
Архивное /ru.linux/1266641d6990b.html, оценка 3 из 5, голосов 10
Яндекс.Метрика
Valid HTML 4.01 Transitional