Главная страница


ru.linux

 
 - RU.LINUX ---------------------------------------------------------------------
 From : Sergey Lentsov                       2:4615/71.10   26 Jan 2001  11:46:03
 To : All
 Subject : URL: http://lwn.net/2001/0125/kernel.php3
 -------------------------------------------------------------------------------- 
 
    [1][LWN Logo] 
    
                                [2]Click Here 
    [LWN.net]
    
    Sections:
     [3]Main page
     [4]Security
     Kernel
     [5]Distributions
     [6]Development
     [7]Commerce
     [8]Linux in the news
     [9]Announcements
     [10]Linux History
     [11]Letters
    [12]All in one big page
    
    See also: [13]last week's Kernel page.
    
 Kernel development
 
    The current kernel release is still 2.4.0. Linus continues to put
    together a 2.4.1 prepatch, currently at [14]2.4.1-pre10. His approach
    remains conservative, and this patch (especially if you ignore
    ReiserFS) is relatively small.
    
    Those looking for something meatier may want to consider, instead,
    [15]2.4.0-ac11 from Alan Cox. This release contains literally hundreds
    of patches - almost 10MB worth.
    
    Cutting out the middleman in data transfers. The discussion started by
    David Miller's posting of an experimental zero-copy networking
    implementation (discussed on this page [16]two weeks ago) continues,
    though it has moved into new areas. One of those is the optimization
    of data transfers to avoid copying the data as much as possible.
    Consider, for example, the sendfile() interface that Linux supports
    now; using sendfile(), an application (a web server, say) can transfer
    a disk file to a network socket without ever having to read it into
    user space. There is an obvious performance gain from operating in
    this mode for certain applications.
    
    So, why not extend the idea to its logical conclusion? Why not have a
    system call that says "copy data from here to there, and optimize as
    much as possible"? One approach to this mode is [17]Larry McVoy's
    'splice' interface, which tries to provide a general way for user
    space processes to control high-performance copies. It provides "push"
    and "pull" primitives which handle the destination and source sides of
    a copy, respectively, and give the application some latitude in how
    the two are put together.
    
    Here's [18]Linus's comments on splice and why it has not been
    implemented so far. Essentially, sendfile handled the task that most
    users wanted, the splice interface needed a bit of work, and it didn't
    fit well into the structure of the kernel at the time. The kernel has
    since evolved, and Linus's message hints that an implementation of a
    modified form of splice would be easier now, and that it might even be
    accepted.
    
    One can take the idea further, however: why not, when appropriate,
    simply tell the hardware to copy the data between devices directly and
    leave the kernel (and the processor) out of it altogether? According
    to Linus, that's one of those great ideas that turns out not to be so
    great in practice. His [19]short response to the idea was:
    
      device-to-device copies sound like the ultimate thing.
      
      They suck. They add a lot of complexity and do not work in general.
      And, if your "normal" usage pattern really is to just move the data
      without even looking at it, then you have to ask yourself whether
      you're doing something worthwhile in the first place.
      
    Further into the discussion, Linus came up with other reasons to avoid
    direct device-to-device (D2D?) copies. One is that [20]there is very
    little use for the capability in the end. One can talk, for example,
    of streaming video directly to disk - but how often will a user be
    recording video without wanting to look at it too? Another is that
    [21]very little hardware supports that mode of operation. Linus sees a
    trend toward connecting hardware with direct, point-to-point links
    that are not amenable to direct operations between devices. Quoth
    Linus: "Just wait. My crystal ball is infallible."
    
    TCP_CORK or MSG_MORE? Another branch of the same discussion has to do
    with getting optimal performance from network transfers. Imagine a web
    server using the sendfile() interface described above. In response to
    a request for a page, the server will first write out a short set of
    HTTP headers, then use sendfile() to actually transfer the page data.
    By the time the sendfile() call is actually made, however, the headers
    will have gone out on the net as a very short packet. The result is
    poor performance on both the sending and receiving side.
    
    Linux has handled this issue with a TCP option called TCP_CORK. If an
    application sets that option on a socket, the kernel will not send out
    short packets. Instead, it will wait until enough data has shown up to
    fill a maximum-size packet, then send it. When TCP_CORK is turned off,
    any remaining data will go out on the wire.
    
    TCP_CORK does the job reasonably well. Recently, however, a contingent
    led by Ingo Molnar has been [22]pushing for a new interface which uses
    a flag called MSG_MORE. Rather than applying to the socket in general,
    MSG_MORE is attached to a one or more write operations on that socket.
    It says "there will be more data coming," and the kernel knows to
    buffer data to get bigger packets. The advantages of this approach are
    said to be (1) it requires no persistent state on the socket, thus
    helping, among other things, to avoid programming errors; and (2) it
    avoids the system call overhead of toggling the TCP_CORK flag. Ingo
    used MSG_MORE in the implementation of the TUX kernel web server, and
    is happy with the results.
    
    Linus, however, is not convinced. MSG_MORE requires a flag to be set
    on every transfer, only works on sockets, and requires that the code
    that is doing the writing be aware of the flag. TCP_CORK, instead,
    works with programs using the standard I/O package, and it can be set
    on sockets that are passed to other applications, such as CGI scripts,
    that are completely unaware of its presence. The TCP_CORK flag
    preserves a lot more of the standard Unix stream semantics.
    
    Conclusion: don't expect to see MSG_MORE show up in user space anytime
    soon.
    
    Fixing the 2.4.0 USB breakage. When 2.4.0 came out, it included a
    last-minute change to the usb_device_id structure, which is used to
    find driver modules for specific USB devices. Unfortunately, the form
    of this change was such that it broke the USB autoloading mechanism
    entirely. Since then, the USB maintainers, along with modutils
    maintainer Keith Owens, have been trying to figure out a way to make
    things work again.
    
    The problem is that modutils, which handles the actual module loading
    process, can not distinguish the new usb_device_id structure from the
    old one. Making modutils work with the 2.4.0 version of the structure
    is not a problem - but then it will cease to work for earlier
    versions. Keith Owens places great importance on backward
    compatibility, and does not want to break things for any version. So
    he has produced [23]a kernel patch which adds a version number to the
    relevant structures. With versioning, changes can be detected and
    everything can be made to work.
    
    Linus, however, [24]does not want to apply the patch. It is, after
    all, a binary interface change; such changes are generally avoided
    within a stable kernel series. Besides, the only other kernels which
    used the USB device table were the 2.4.0-test kernels - that structure
    was added in 2.4.0-test10. Nobody feels all that bad about breaking
    the prerelease kernels, in the end.
    
    Almost nobody, that is; Mr. Owens is still not entirely happy. He has
    released [25]modutils-2.4.2 which makes the 2.4.0 format work, but he
    has done so "under protest." People who want to be able to switch
    between 2.4.0 and the 2.4.0-test kernels will have to keep two
    versions of modutils around; everybody else can just install 2.4.2 and
    USB autoloading will work again.
    
    Should the kbuild list move to SourceForge? Michael Elizabeth Chastain
    has posted [26]a proposal to move the kbuild mailing list (which
    discusses the kernel configuration and building system) to a
    SourceForge project. He has a few reasons, but any kbuild reader will
    know the first one intuitively: spam routinely exceeds real postings
    on that list. With luck, moving to a site with better spam filtering
    would help to make the list usable again.
    
    The one objection to the move came in the form of [27]this posting,
    which raised the concern that the free software world is becoming too
    dependent on SourceForge.
    
      But it just concerns me when a single company has the ability to
      (temporarily) freeze the development of half the world's
      open-source software just by unplugging a roomful of servers,
      either voluntarily or not (think "court order").
      
    This is a concern that LWN has raised in the past as well. This time,
    however, there was a semi-official response in the form of [28]this
    message from Eric Raymond, who is on the VA Linux board of directors.
    According to Eric:
    
      We're not blind to this problem. We don't want to be a chokepoint;
      it's in VA's interest for the community to know it's protected
      against accident or malfeasance. This is why we're developing a
      network of active mirror sites -- not just to improve performance,
      but so one of them could take the baton if the SourceForge primary
      site had to shut down for some reason.
      
    It is good to see an acknowledgement of this concern from VA.
    SourceForge is a great resource, but it has led to an unprecedented
    concentration of free software projects in a single place.
    
    Other patches and updates released this week include:
    
      * Neil Brown has released [29]a RAID5 patch which should fix the
        filesystem corruption problems that people have been reporting.
      * Douglas Gilbert's [30]The Linux SCSI subsystem in 2.4 HOWTO has
        been accepted by the Linux Documentation Project. It describes the
        SCSI system from a user's perspective, with much useful
        information on SCSI configuration and operation.
      * [31]Dynamic Probes 1.3 has been released by Suparna Bhattacharya
        at IBM.
      * Heinz Mauelshagen has [32]released version 0.9.1beta2 of the
        Logical Volume Manager subsystem.
      * [33]A new multi-queue scheduling patch has been released by Mike
        Kravetz. It includes a set of benchmark results that would appear
        to indicate much improved performance when dealing with large
        numbers of processes.
      * Robert de Vries has posted [34]a version of the POSIX timers patch
        for the 2.4.0 kernel.
      * [35]Version 1.8 of the x86 performance monitoring counters driver
        has been released by Mikael Pettersson.
      * Rusty Russell posted [36]a patch fixing some 2.4.0 netfilter bugs.
      * A.M. Kuchling has written up [37]a look at Linux kernel
        development, comparing it (unfavorably) with how Python
        development is handled.
      * Greg K-H has released [38]a new version of the hotplug scripts
        package.
      * If you're looking for a kernel hacking task to jump into, there's
        a whole set waiting on the new [39]netfilter TODO list, maintained
        by Harald Welte.
      * David Miller has released [40]an updated version of his zero-copy
        networking patch.
      * Sam Watters has posted [41]the 'PAGG and Job module' for the 2.4.0
        kernel. It is a job-level accounting system which has been
        developed by Los Alamos National Laboratory and SGI. Thos module
        works with the [42]Comprehensive System Accounting package, also
        just released.
        
    Section Editor: [43]Jonathan Corbet
    January 25, 2001
    
    For other kernel news, see:
      * [44]Kernelnotes
      * [45]Kernel traffic
      * [46]Kernel Newsflash
      * [47]Kernel Trap
    
    Other resources:
      * [48]Kernel Source Reference
      * [49]L-K mailing list FAQ
      * [50]Linux-MM
      * [51]Linux Scalability Project
    
    
    
                                                   [52]Next: Distributions
    
    [53]Eklektix, Inc. Linux powered! Copyright Щ 2001 [54]Eklektix, Inc.,
    all rights reserved
    Linux Ю is a registered trademark of Linus Torvalds
 
 References
 
    1. http://lwn.net/
    2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
    3. http://lwn.net/2001/0125/
    4. http://lwn.net/2001/0125/security.php3
    5. http://lwn.net/2001/0125/dists.php3
    6. http://lwn.net/2001/0125/devel.php3
    7. http://lwn.net/2001/0125/commerce.php3
    8. http://lwn.net/2001/0125/press.php3
    9. http://lwn.net/2001/0125/announce.php3
   10. http://lwn.net/2001/0125/history.php3
   11. http://lwn.net/2001/0125/letters.php3
   12. http://lwn.net/2001/0125/bigpage.php3
   13. http://lwn.net/2001/0118/kernel.php3
   14. http://lwn.net/2001/0125/a/2.4.1-pre10.php3
   15. http://lwn.net/2001/0125/a/2.4.0-ac11.php3
   16. http://lwn.net/2001/0111/kernel.php3
   17. http://lwn.net/2001/0125/a/splice.php3
   18. http://lwn.net/2001/0125/a/lt-splice.php3
   19. http://lwn.net/2001/0125/a/lt-devcopy.php3
   20. http://lwn.net/2001/0125/a/lt-devcopy2.php3
   21. http://lwn.net/2001/0125/a/lt-devcopy3.php3
   22. http://lwn.net/2001/0125/a/msg_more.php3
   23. http://lwn.net/2001/0125/a/ko-usb-patch.php3
   24. http://lwn.net/2001/0125/a/lt-usb-patch.php3
   25. http://lwn.net/2001/0125/a/modutils-2.4.2.php3
   26. http://lwn.net/2001/0125/a/kbuild-move.php3
   27. http://lwn.net/2001/0125/a/kbuild-worries.php3
   28. http://lwn.net/2001/0125/a/esr-sourceforge.php3
   29. http://lwn.net/2001/0125/a/raid5-patch.php3
   30. http://linuxdoc.org/HOWTO/SCSI-2.4-HOWTO/index.html
   31. http://lwn.net/2001/0125/a/dynamic-probes.php3
   32. http://lwn.net/2001/0125/a/lvm.php3
   33. http://lwn.net/2001/0125/a/mqs.php3
   34. http://lwn.net/2001/0125/a/posix-timers.php3
   35. http://lwn.net/2001/0125/a/pmcd.php3
   36. http://lwn.net/2001/0125/a/netfilter-patch.php3
   37. http://www.amk.ca/writing/linux-devel.html
   38. http://lwn.net/2001/0125/a/hotplug-scripts.php3
   39. http://lwn.net/2001/0125/a/netfilter-todo.php3
   40. http://lwn.net/2001/0125/a/zero-copy.php3
   41. http://lwn.net/2001/0125/a/pagg.php3
   42. http://lwn.net/2001/0125/a/csa.php3
   43. mailto:lwn@lwn.net
   44. http://www.kernelnotes.org/
   45. http://kt.linuxcare.com/
   46. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
   47. http://www.kerneltrap.com/
   48. http://lksr.org/
   49. http://www.tux.org/lkml/
   50. http://www.linux.eu.org/Linux-MM/
   51. http://www.citi.umich.edu/projects/linux-scalability/
   52. http://lwn.net/2001/0125/dists.php3
   53. http://www.eklektix.com/
   54. http://www.eklektix.com/
 --- ifmail v.2.14.os7-aks1
  * Origin: Unknown (2:4615/71.10@fidonet)
 
 

Вернуться к списку тем, сортированных по: возрастание даты  уменьшение даты  тема  автор 

 Тема:    Автор:    Дата:  
 URL: http://lwn.net/2001/0125/kernel.php3   Sergey Lentsov   26 Jan 2001 11:46:03 
Архивное /ru.linux/126669b2896d3.html, оценка 2 из 5, голосов 10
Яндекс.Метрика
Valid HTML 4.01 Transitional