Главная страница


ru.linux

 
 - RU.LINUX ---------------------------------------------------------------------
 From : Sergey Lentsov                       2:4615/71.10   16 May 2002  22:36:37
 To : All
 Subject : URL: http://www.lwn.net/2002/0516/kernel.php3
 -------------------------------------------------------------------------------- 
 
    [1][LWN Logo] 
    [LWN.net]
 
    Sections:
     [2]Main page
     [3]Security
     Kernel
     [4]Distributions
     [5]Development
     [6]Commerce
     [7]Linux in the news
     [8]Announcements
     [9]Letters
    [10]All in one big page
 
    See also: [11]last week's Kernel page.
 
 Kernel development
 
    The current development kernel release is 2.5.15, released on May 9.
    Changes this time around include a resumption of the "device model"
    work (with an emphasis on the x86 PCI code), more IDE reworking
    (including the removal of /proc/ide - see [12]last week's LWN Kernel
    Page), an NFS server update, many patches from the "dj" series, and
    lots of other fixes and updates.
 
    The in-progress 2.5.16 patch, as seen in BitKeeper, includes an ISDN
    update, George Anziger's [13]64-bit jiffies patch, the usual IDE
    patches, some networking updates, work on the new NFS export scheme,
    and more.
 
    Dave Jones's latest patch is [14]2.5.15-dj1, which contains a
    relatively small set of fixes and updates.
 
    The latest [15]2.5 status summary from Guillaume Boissiere is dated
    May 15.
 
    The current stable kernel release is 2.4.18. No 2.4.19 prepatches have
    been released by Marcelo this week.
 
    The current patch from Alan Cox is [16]2.4.19-pre8-ac4. The biggest
    change here is a new set of IDE updates by Andre Hedrick that went
    into -ac3. The 2.4 and 2.5 IDE subsystems continue to go in very
    different directions.
 
    On the 2.2 front, Alan has released [17]2.2.21-rc4, the latest 2.2.21
    release candidate. Unless something turns up, this one will become the
    real 2.2.21.
 
    The future of in-kernel web servers. Some recent discussion on
    troubles with khttpd, the in-kernel web server which has been present
    since the early 2.3 days, led to the statement that khttpd would soon
    be removed from the 2.5 series. khttpd has a number of happy users,
    but it has been essentially unmaintained for a number of years, and it
    has been superseded by Ingo Molnar's TUX server. So the kernel
    developers see little reason to keep it around.
 
    The more interesting question, perhaps, is whether TUX will take the
    place of khttpd. There appears to be little consensus on whether TUX
    should go in or not. Some developers are worried about the impact of
    the TUX patch, while others claim it affects little other code. It is
    not clear how much of a performance benefit TUX really provides - some
    user-space web servers are said to be getting quite close to TUX in
    speed. And, of course, a number of people feel that an application
    like a web server has no place inside the Linux kernel.
 
    Servers like TUX and khttpd remain interesting as a demonstration of
    how to create the shortest, fastest path between the network and files
    on a disk. Chances are that TUX will find its way into a mainline
    kernel sooner or later.
 
    Per-driver filesystems made easy. Alexander Viro has long been a
    proponent of small, special-purpose filesystems as a way for device
    drivers (or other kernel subsystems) to communicate with user space.
    The mini filesystem approach, he says, is a far cleaner and safer
    technique than the alternatives: /proc, the ioctl() call, or devfs.
    This approach makes sense to a number of people, but it has not been
    widely adopted. After all, if you are not Al Viro (which is the case
    for most of us), hacking up a new filesystem can be a little
    intimidating.
 
    So he has been trying for a while to make the task of writing driver
    filesystems easier. His [18]latest posting includes a set of library
    functions which mostly concern themselves with the creation of
    superblocks for virtual filesystems. The superblock is a good thing to
    hide within a library layer; virtual filesystems just need something
    to hand to the VFS; there should be no need for each one to duplicate
    a lot of "fill in the superblock field" code.
 
    The other half of the posting is a driver which creates a little
    filesystem to export the value of a set of VIA motherboard temperature
    sensors. The whole thing takes up 70 lines of code, and much of that,
    of course, is dealing with getting information from the sensors. The
    task of creating special purpose virtual filesystems has indeed been
    made easy.
 
    The trickier part in the long run may be on the system administration
    side. If the mini filesystem approach takes off, each system will have
    to be configured to mount these filesystems in the right places. /proc
    files and ioctl() calls just show up in their standard places, but
    filesystems must be explicitly mounted somewhere. How are VIA
    motherboard users to know that they can mount a devvia filesystem
    somewhere to read their temperature sensors? Add in a dozen other
    hardware-specific filesystems and one begins to see that some work on
    system administration tools will be needed to make it all easy to
    manage.
 
    A different approach to asynchronous I/O. It started with a discussion
    of the O_DIRECT flag, which can be used to request that "direct" I/O
    be performed on a file. Direct I/O moves data directly between the
    userspace buffer and the device performing the I/O, without copying
    through kernel space. Direct I/O can be faster, since it avoids copy
    operations and because it does not fill the system's page cache with
    data that will not be used again.
 
    It was [19]noted recently that benchmarks using O_DIRECT tend to
    perform worse than those using regular, cached I/O. The reason for
    this poor performance is reasonably straightforward: direct I/O, as
    implemented in Linux, is synchronous. The application must sleep and
    wait for the operation to complete, and there is no opportunity to
    reorder operations for better I/O performance. If you really want to
    make O_DIRECT work well, you need to combine it with asynchronous I/O.
 
    So, one would think, there would be a motivation to get the
    asynchronous I/O patches into the 2.5 kernel. Linus, however, has
    other ideas, based on [20]his opinion of O_DIRECT:
 
      The thing that has always disturbed me about O_DIRECT is that the
      whole interface is just stupid, and was probably designed by a
      deranged monkey on some serious mind-controlling substances.
 
    In other words, one might conclude that he doesn't like it.
 
    A statement like that, of course, raises an immediate question: how,
    exactly, would one design a high-performance, zero-copy, asynchronous
    I/O subsystem if you can't get the monkeys to share their substances
    with you? Linus's [21]answer is to split apart the two aspects of the
    problem: performing the I/O and connecting the data to user space.
 
    In this new scheme, a process wishing to do asynchronous, direct reads
    from a file would, after opening that file, invoke a new system call:
      readahead (file_desc, offset, size);
 
    This call will set the kernel to populating the system's page cache
    with data from the file starting at the given offset, for an amount
    approximating size. At this point, the data is in (kernel) memory, and
    is not visible to the userspace application. Actually getting at the
    data requires calling mmap with a special MAP_UNCACHED flag.
 
    This memory mapping is special in a couple of ways. One is that it
    does not set up any page tables when the mapping is established, so it
    happens very quickly. The other is that, when the user application
    generates a page fault (by trying to access the data it ordered with
    readahead()), the page is "stolen" from the page cache and turned into
    a private page belonging to the application. Until the fault happens,
    the read operation is entirely asynchronous; once the application
    actually tries to use the data, it will wait if the operation still
    has not completed.
 
    If the application is, instead, looking to write data, it starts by
    populating its mapped memory segment. When things are ready to go,
    another new system call:
         mwrite (file_desc, address, length);
 
    is used. mwrite() puts the page back into the page cache (where it
    will get written eventually) and removes it from the process's page
    table. The (new) fdatasync_area() system call may be used to force
    (and wait for) specific pages to be written.
 
    A process which is simply copying data need never access the pages in
    the mapping directly. In this case, no page tables ever get built, and
    things go even more quickly. Pure copy cases are relatively rare,
    though, especially since this scheme would not support I/O to network
    connections (which do not use the page cache). The high-profile
    application for this sort of I/O (or O_DIRECT) is Oracle, which
    performs lots of I/O out of large segments.
 
    So far, all this is just a scheme sketched out by Linus, with no
    implementation to play with. Should some ambitious kernel hacker code
    it up, however, it would be interesting to see how it really performs
    relative to other techniques.
 
    Corrections on the buffer head work. Andrew Morton politely pointed
    out that your editor was more confused than usual when writing about
    Andrew's buffer head work last week. The bulk of that work actually
    affected the way the write() system call was handled. In the old
    scheme, data to be written back to files would find its way into the
    buffer head least-recently-used queue, where it would eventually be
    flushed to disk. With the new code, this data is written directly from
    the page cache, in a more page-oriented mode.
 
    Buffer heads are still used to coordinate the I/O process, for now. As
    a result of all the block layer work that has gone in, the block
    system now takes those buffer heads and digs down to the real pages
    underneath them. So, at some point, an obvious step will be to remove
    the buffer head "middleman," and submit pages to be written directly
    to the block layer. So, eventually, buffer heads will no longer be the
    main I/O mechanism for block I/O.
 
    Sorry for the confusion.
 
    Other patches and updates released this week include:
 
    Kernel trees:
      * Martin Loschwitz: [22]2.5.15-ml2; looks like 2.5.15 plus recent,
        mainstream patches.
      * Joerg Prante: [23]2.4.19-pre8-jp12; ALSA, JFS, XFS, RMAP,
        preemptible kernel, FreeS/WAN, etc.
      * J.A. Magallon: [24]2.4.19-pre8-jam2.
      * Andrea Arcangeli: [25]2.4.19-pre8-aa3.
 
    Core kernel code:
      * Rik van Riel: [26]I/O wait statistics.
      * Rusty Russell: [27]Futex update.
      * Hugh Dickens: [28]noht boot option to disable hyperthreading.
      * Patricia Gaughen: [29]discontiguous memory support for ia32 NUMA
        systems.
      * Hanna Linder: [30]fast walk dcache for 2.4.19-pre8.
      * Rusty Russell: hotplug CPU preparation, mostly dealing with the
        management of idle tasks on new CPUs ([31]I, [32]II, [33]III,
        [34]IV, and [35]V)
 
    Device drivers
      * Martin Dalecki: IDE reworking: ( [36]59, [37]60, [38]61, [39]62a,
        (Linus [40]didn't like 62), [41]63, [42]64
      * Bakonyi Ferenc: [43]RivaTV driver 0.8.0.
      * Denis Oliver Kropp: [44]VMWare framebuffer driver, version 0.5.2.
      * Richard Gooch: [45]devfs v199.14 for 2.4.19-pre8 and [46]version
        213 for 2.5.15. .
      * Johannes Erdfelt: [47]rework USB device reference counting.
      * Greg Kroah-Hartman: [48]further rework USB reference counting.
      * Neil Brown: make RAID 5 work in 2.5 ([49]1, [50]2, and [51]3)
 
    Filesystems:
      * Anton Altaparmakov: [52]NTFS 2.0.7.
      * Pawel Kot: [53]backport of NTFS 2.0.7 for 2.4.18.
      * Jan Harkes: new iget_locked() function for inode creation ([54]1,
        [55]2, [56]3, [57]4, [58]5, and [59]6)
      * Peter Chubb: [60]remove 2TB filesystem size limit.
      * Hirotaka Sasaki: [61]alternative patch to remove the 2TB limit.
 
    Kernel building:
      * Keith Owens: [62]kbuild 2.5 core-14. Keith has also posted
        [63]another note stating that kbuild is ready for inclusion.
      * Andi Kleen: add a [64]CONFIG_ISA option.
 
    Miscellaneous:
      * Denis Vlasenko: [65]kernel maintainers file.
      * Karim Yaghmour: [66]Linux Trace Toolkit for 2.5.15.
      * Neil Brown: [67]mdadm tool 1.0.0 for the management of RAID sets.
      * Greg Kroah-Hartman: [68]pcihpview 0.3, a GUI tools for PCI hotplug
        management.
      * Patricia Gaughen: [69]updated NUMA status page.
      * Jari Ruusu: [70]loop-AES 1.6c file and swap crypto package.
      * [71]Kernel Traffic #166 is available.
 
    Ports:
      * James Bottomley: [72]NCR Voyager port.
      * James Bottomley: [73]split up i386 code into subarchitectures.
      * Robert Love: [74]preemptible kernel for MIPS processors for
        2.4.19-pre8.
 
    Section Editor: [75]Jonathan Corbet
    May 16, 2002
 
             [76]Work on the world's most powerful Linux computer
 
    Kernel programmers: Come work on the world's most powerful Linux
    supercomputer. Pacific Northwest National Laboratory (operated by
    Battelle for the U.S. Department of Energy) is interested in hard
    working people to join our team. Everything produced will remain GPL.
 
    [77]Interested candidates can apply online immediately.
 
    [78]Learn more about this computer.
 
    PNNL is an EEO/AA employer and values diversity in the workplace.
    F/M/D/V are encouraged to apply.
 
    For other kernel news, see:
      * [79]Kernel traffic
      * [80]Kernel Newsflash
      * [81]Kernel Trap
      * [82]2.5 Status
    
    Other resources:
      * [83]L-K mailing list FAQ
      * [84]Linux-MM
      * [85]Linux Scalability Effort
      * [86]Kernel Newbies
      * [87]Linux Device Drivers
    
    
 
                                                   [88]Next: Distributions
 
    [89]Eklektix, Inc. Linux powered! Copyright Л 2002 [90]Eklektix, Inc.,
    all rights reserved
    Linux (R) is a registered trademark of Linus Torvalds
 
 References
 
    1. http://lwn.net/
    2. http://lwn.net/2002/0516/
    3. http://lwn.net/2002/0516/security.php3
    4. http://lwn.net/2002/0516/dists.php3
    5. http://lwn.net/2002/0516/devel.php3
    6. http://lwn.net/2002/0516/commerce.php3
    7. http://lwn.net/2002/0516/press.php3
    8. http://lwn.net/2002/0516/announce.php3
    9. http://lwn.net/2002/0516/letters.php3
   10. http://lwn.net/2002/0516/bigpage.php3
   11. http://lwn.net/2002/0509/kernel.php3
   12. http://lwn.net/2002/0509/kernel.php3
   13. http://lwn.net/2002/0516/a/64-bit-jiffies.php3
   14. http://lwn.net/2002/0516/a/2.5.15-dj1.php3
   15. http://lwn.net/2002/0516/a/2.5-status.php3
   16. http://lwn.net/2002/0516/a/2.4.19-pre8-ac4.php3
   17. http://lwn.net/2002/0516/a/2.2.21-rc4.php3
   18. http://lwn.net/2002/0516/a/driverfs-made-easy.php3
   19. http://lwn.net/2002/0516/a/O_DIRECT-performance.php3
   20. http://lwn.net/2002/0516/a/lt-deranged-monkey.php3
   21. http://lwn.net/2002/0516/a/lt-async.php3
   22. http://lwn.net/2002/0516/a/2.5.15-ml2.php3
   23. http://lwn.net/2002/0516/a/2.4.19-pre8-jp12.php3
   24. http://lwn.net/2002/0516/a/2.4.19-pre8-jam2.php3
   25. http://lwn.net/2002/0516/a/2.4.19-pre8-aa3.php3
   26. http://lwn.net/2002/0516/a/iowait-stats.php3
   27. http://lwn.net/2002/0516/a/futex.php3
   28. http://lwn.net/2002/0516/a/noht.php3
   29. http://lwn.net/2002/0516/a/discontig.php3
   30. http://lwn.net/2002/0516/a/fastwalk.php3
   31. http://lwn.net/2002/0516/a/hotplug-cpu-1.php3
   32. http://lwn.net/2002/0516/a/hotplug-cpu-2.php3
   33. http://lwn.net/2002/0516/a/hotplug-cpu-3.php3
   34. http://lwn.net/2002/0516/a/hotplug-cpu-4.php3
   35. http://lwn.net/2002/0516/a/hotplug-cpu-5.php3
   36. http://lwn.net/2002/0516/a/ide-59.php3
   37. http://lwn.net/2002/0516/a/ide-60.php3
   38. http://lwn.net/2002/0516/a/ide-61.php3
   39. http://lwn.net/2002/0516/a/ide-62a.php3
   40. http://lwn.net/2002/0516/a/lt-ide-62.php3
   41. http://lwn.net/2002/0516/a/ide-63.php3
   42. http://lwn.net/2002/0516/a/ide-64.php3
   43. http://lwn.net/2002/0516/a/rivatv.php3
   44. http://lwn.net/2002/0516/a/vmwarefb.php3
   45. http://lwn.net/2002/0516/a/devfs-v199.14.php3
   46. http://lwn.net/2002/0516/a/devfs-v213.php3
   47. http://lwn.net/2002/0516/a/usb-refcount.php3
   48. http://lwn.net/2002/0516/a/usb-refcount-2.php3
   49. http://lwn.net/2002/0516/a/raid-1.php3
   50. http://lwn.net/2002/0516/a/raid-2.php3
   51. http://lwn.net/2002/0516/a/raid-3.php3
   52. http://lwn.net/2002/0516/a/ntfs.php3
   53. http://lwn.net/2002/0516/a/ntfs-207a.php3
   54. http://lwn.net/2002/0516/a/iget-1.php3
   55. http://lwn.net/2002/0516/a/iget-2.php3
   56. http://lwn.net/2002/0516/a/iget-3.php3
   57. http://lwn.net/2002/0516/a/iget-4.php3
   58. http://lwn.net/2002/0516/a/iget-5.php3
   59. http://lwn.net/2002/0516/a/iget-6.php3
   60. http://lwn.net/2002/0516/a/2tb.php3
   61. http://lwn.net/2002/0516/a/2tb-2.php3
   62. http://lwn.net/2002/0516/a/kbuild.php3
   63. http://lwn.net/2002/0516/a/kbuild-ready.php3
   64. http://lwn.net/2002/0516/a/config-isa.php3
   65. http://lwn.net/2002/0516/a/maintainers.php3
   66. http://lwn.net/2002/0516/a/ltt.php3
   67. http://lwn.net/2002/0516/a/mdadm.php3
   68. http://lwn.net/2002/0516/a/pcihpview.php3
   69. http://lse.sf.net/numa/numastatus.html
   70. http://lwn.net/2002/0516/a/loop-aes.php3
   71. http://kt.zork.net/kernel-traffic/kt20020513_166.html
   72. http://lwn.net/2002/0516/a/voyager.php3
   73. http://lwn.net/2002/0516/a/i386-split.php3
   74. http://lwn.net/2002/0516/a/preempt-mips.php3
   75. mailto:lwn@lwn.net
   76.
 http://oasis.lwn.net/oasisc.php?s=5&c=30&cb=1981253904&url=http%3A%2F%2Fjobs.pnl
 .gov%2Fasp%2FReqDescr%2FReqDescr.asp%3Fv_ReqNbr%3D103909%26company%3DPNL
   77.
 http://oasis.lwn.net/oasisc.php?s=5&c=30&cb=1981253904&url=http%3A%2F%2Fjobs.pnl
 .gov%2Fasp%2FReqDescr%2FReqDescr.asp%3Fv_ReqNbr%3D103909%26company%3DPNL
   78.
 http://oasis.lwn.net/oasisc.php?s=5&c=30&cb=1981253904&url=http%3A%2F%2Fwww.pnl.
 gov%2Fnews%2F2002%2Fcomputer.htm
   79. http://kt.zork.net/
   80. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
   81. http://www.kerneltrap.com/
   82. http://kernelnewbies.org/status/
   83. http://www.tux.org/lkml/
   84. http://linux-mm.org/
   85. http://lse.sourceforge.net/
   86. http://www.kernelnewbies.org/
   87. http://www.xml.com/ldd/chapter/book/index.html
   88. http://lwn.net/2002/0516/dists.php3
   89. http://www.eklektix.com/
   90. http://www.eklektix.com/
 
 --- ifmail v.2.14.os7-aks1
  * Origin: Unknown (2:4615/71.10@fidonet)
 
 

Вернуться к списку тем, сортированных по: возрастание даты  уменьшение даты  тема  автор 

 Тема:    Автор:    Дата:  
 URL: http://www.lwn.net/2002/0516/kernel.php3   Sergey Lentsov   16 May 2002 22:36:37 
Архивное /ru.linux/19861406b6f95.html, оценка 3 из 5, голосов 10
Яндекс.Метрика
Valid HTML 4.01 Transitional