Главная страница


ru.linux

 
 - RU.LINUX ---------------------------------------------------------------------
 From : Sergey Lentsov                       2:4615/71.10   28 Jan 2002  20:54:18
 To : All
 Subject : URL: http://www.lwn.net/2002/0124/kernel.php3
 -------------------------------------------------------------------------------- 
 
    [1][LWN Logo] [No ads right now]
    [LWN.net]
    
    Sections:
     [2]Main page
     [3]Security
     Kernel
     [4]Distributions
     [5]Development
     [6]Commerce
     [7]Linux in the news
     [8]Announcements
     [9]Linux History
     [10]Letters
    [11]All in one big page
    
    See also: [12]last week's Kernel page.
    
 Kernel development
 
    The current development kernel release is still 2.5.2. Linus has
    issued 2.5.3 prepatches up to [13]2.5.3-pre4; this prepatch has a
    working ATA/IDE layer, a reworking of the in-core inode structure, and
    a lot of fixes.
    
    Those interested in what is happening with the 2.5 kernel may want to
    have a look at [14]Guillaume Boissiere's 2.5 Status summary. This
    summary is now also available [15]on the KernelNewbies.org site.
    
    The current stable kernel release is 2.4.17. Marcelo's latest prepatch
    is [16]2.4.18-pre7, which contains quite a few fixes and updates, and
    not much else. A 2.4.18 release candidate has not yet been issued.
    
    Other kernel tree releases: Dave Jones has released [17]2.5.2-dj4;
    this one is updated through 2.5.3-pre2. It adds a more recent version
    of the scheduler patch, various fixes, and a major update which makes
    all input devices use the input layer. As a result, people who run
    this kernel will have to enable some new configuration options, and
    probably change their X server configuration as well - unless, of
    course, they have no keyboards or mice to worry about. See [18]Vojtech
    Pavlik's note for more information on what is required.
    
    Andrea Arcangeli has announced [19]2.4.18-pre4-aa1. The most
    interesting thing in this patch, perhaps, is a change which lets the
    kernel put page tables in high memory. "It only compiles on x86 and it
    is still a bit experimental. I couldn't reproduce problems yet
    though."
    
    [20]2.0.40-rc2 has been released by David Weinehall. This one will go
    out as the real 2.0.40 stable release unless somebody comes up with a
    good reason why it shouldn't.
    
    What Rik van Riel is up to. VM hacker Rik van Riel may have taken a
    bit of a setback when Linus replaced his code with an alternative
    virtual memory implementation in the 2.4 series. He has not, however,
    given up on VM work; instead, he has been steadily releasing a set of
    "reverse mapping" ("rmap") patches for the 2.4 kernels. The rmap code
    was incorporated into 2.4.18-pre3-ac2 by Alan Cox, who [21]compares
    the rmap VM favorably with the much-respected FreeBSD memory
    implementation.
    
    So perhaps it's time to give rmap a look, starting with a bit of
    superficial background. Linux, of course, is a virtual memory system.
    This implies that every address generated by a process must be looked
    up (by the hardware) in a page table. The page table entry (PTE) will
    either map the address onto a physical memory address or note that the
    page is not present.
    
    A key point is that a page table is a one-way mapping. Given a virtual
    address in a process's context, the corresponding physical address may
    be found. But there is, in the stock kernel, no easy way to find which
    page table entry (in which process's context) refers to a specific
    page. Things are complicated by the fact that pages can be shared
    among processes as well; if a page in memory holds code from the C
    library, for example, there will be many page table entries pointing
    to it. The only way to find them is to scan every process's page
    tables, looking for matching entries. That is a long and expensive
    task.
    
    The one-way nature of the Linux VM makes memory management tasks
    harder. Before the kernel can free a given page in physical memory, it
    must find and mark every page table entry pointing to that page. This
    is done by scanning page tables and "freeing" pages by invalidating
    page table entries and decrementing the reference counts on the
    corresponding pages. When the reference count goes to zero, the system
    knows that the page is now truly free.
    
    Managing physical memory by scanning virtual memory is inefficient.
    Many tables may have to be scanned before a given page can be freed.
    And if the kernel has a pressing need to free pages in a particular
    zone (subsection of physical memory), scanning virtual memory is not
    particularly helpful. It may be necessary to look at a large number of
    PTEs before finding a single page in the right zone.
    
    The solution to these problems is reverse mapping, the creation of a
    data structure which, given a physical page, can return a list of PTEs
    which point to that page. The logical place for this information is
    the system memory map, which is an array of struct page structures,
    one for each page of physical memory on the system. Rik's patch adds a
    pte_chain member to the page structure; it points to a simple linked
    list of pointers to PTEs. Access is thus simple; if you have a
    physical page you want to work with, just go to its page structure and
    follow the chain.
    
    Once you have that capability, a number of things become possible.
    Freeing a page is now straightforward, since all of the relevant PTEs
    can be found and modified at once. It is also easier to keep track of
    which pages are actually being used; follow the pte_chain and check
    each entry's "referenced" bit, and adjust the page's "age"
    accordingly. This information will help the VM system pick the right
    pages to throw out. If memory is tight in a particular zone, the
    physical pages in that zone can be scanned directly without having to
    sift through tremendous numbers of irrelevant page table entries. All
    of these features will help to create a more responsive and stable VM
    under varying loads.
    
    There is a cost, of course. The page structure has a new field, the
    pte_chain pointer. Then there are linked list entries for the reverse
    mappings. This extra memory usage matters. As a simplified, "back of
    the envelope" calculation, consider a 32-bit system with 128MB of main
    memory, using 4KB pages, and with exactly one PTE for every physical
    page. This system has 32768 pages; the overhead for the pte_chain, at
    12 bytes/page, will occupy almost 400KB of memory - 96 pages. That is
    a substantial increase in the kernel's memory use - some would call it
    severe bloat.
    
    The memory used for reverse mappings is actually pretty small compared
    to the full overhead of the VM system. Even so, the rmap patch tries
    to mitigate that impact somewhat with another change. The standard
    Linux page structure includes a wait queue for any process that needs
    to perform an exclusive operation on the page. That wait_queue_head_t
    field takes up 12 bytes (i386 architecture, at least) and tends to be
    unused much of the time. It is not often that a process must actually
    wait on a page. So, in the rmap patch, the wait queue has been removed
    from the page structure. Instead, a much smaller list of wait queues
    is maintained; for a given physical page, a hash function is used to
    find the associated wait queue. Occasional collisions will occur,
    resulting in processes waking up when their pages are not yet ready.
    That is a performance penalty that, with clever coding, should be far
    outweighed by the memory savings.
    
    The patch contains some other bits, such as a simple "defragmenter"
    which tries to make large, contiguous memory allocations work (though
    most of the implementation work remains to be done), and a "drop
    behind" function which frees up pages belonging to files when a
    process is doing sequential I/O and has passed over them. There is
    also a more structured approach to "inactive" pages - pages which have
    been taken away from a process but which still contain the
    (potentially useful) data. The new code tries to keep around a fair
    number of clean, inactive pages; these pages can be quickly given back
    to their processes if called for, but are also available for
    allocation elsewhere if need be. Finally, the patch adds a fair number
    of general cleanups and a lot of comments.
    
    Rik's patch has drawn a number of favorable reviews. For now, however,
    it is not being proposed for inclusion into 2.5. Indeed, it is only
    available for the 2.4 kernel series. Rik is currently working with 2.4
    only as a way of having a stable base to start from. VM hacking can
    lead to weird and subtle bugs; it's not helpful if the rest of the
    kernel is also in great flux with bugs of its own. There will
    eventually be a 2.5 version, Rik tells us, when things have calmed
    down and the rmap patch itself is in a more finished state.
    
    The current rmap version is [22]release 12a.
    
    What is up with the Athlon bug? The word first showed up on the
    [23]Gentoo Linux site: a bug in the AMD Athlon CPU could cause data
    corruption on Linux systems. The word was that the problem had to do
    with what happens when 4MB pages are invalidated by the processor; the
    workaround was to tell the kernel to run without large pages with the
    mem=nopentium boot option.
    
    The only problem is this: the Linux kernel only uses 4MB pages for
    kernel space itself. It maps all of (low) memory using large pages,
    then leaves the mapping alone - 4MB pages are never invalidated. The
    explanation left many kernel hackers unsatisfied, and the
    investigation continued.
    
    What is actually going on, as [24]posted by Gentoo's Daniel Robbins,
    is rather more subtle. The kernel's 4MB mappings cover all of (low)
    physical memory, including things like AGP memory. In some situations,
    the CPU can generate "speculative writes" to that memory via the 4MB
    mapping, and this has the effect of loading a cache line with data
    from memory. That cache line will eventually be written back to memory
    (even though the "speculative write" is never executed and the data
    has not been changed); unfortunately, the AGP processor can have
    modified the underlying memory in the mean time. The cached memory is
    thus stale and incorrect, and corrupts things.
    
    Real fixes are still in the works. Meanwhile the mem=nopentium option
    will work for people who are affected by this problem.
    
    Creeping ACPI. Jes Sorensen [25]tracked down a problem with his shiny
    new Vaio laptop; it seems that the interrupt line for his CardBus
    controller was not getting set up properly. He has posted a small,
    special-purpose fix which patches things up in that case.
    
    The underlying problem, however, remains. Many of the older,
    BIOS-level hardware tables which have traditionally been used to
    configure things like interrupts are going away. Instead, the newer
    ACPI standard is being used. If the kernel is to be able to work with
    newer hardware, it will need a functioning ACPI implemention,
    including the AML interpreter.
    
    Running the full-blown ACPI setup is not an entirely popular idea, as
    was discussed on this page [26]last July. ACPI brings substantial
    amounts of kernel bloat, reliability worries, and security concerns.
    Many (or most) people who have really looked over ACPI tend to be
    unenthusiastic about putting it into their kernels.
    
    Finding a solution that allows future hardware to work without
    equipping the Linux kernel with an interpreter that can run arbitrary,
    closed source code is going to be a challenge. Proposals for a
    "configure and dump" mode for ACPI will address the bloat concerns,
    but not the others. It will not be a good day when Linux can configure
    a disk drive, but only at the cost of running a bunch of closed, buggy
    AML code with, perhaps, some "digital rights management" software
    thrown in as a bonus.
    
    Other patches and updates released this week include:
    
      * Daniel Quinlan has posted [27]a set of cramfs updates for the
        2.4.17 kernel.
      * Olaf Dietsche has [28]released version 0.2 of his accessfs
        administrative filesystem.
      * A [29]new version of the NCR Voyager patch was posted by James
        Bottomley.
      * Ben LaHaise has [30]posted a patch which allows experimental
        system calls to be exported to user space by name.
      * [31]EVMS 0.9.0, the first beta release of the Enterprise Volume
        Management System, was announced by Kevin Corry.
      * Greg Kroah-Hartman has [32]announced a PCI hotplug controller
        driver for IBM motherboards.
      * [33]kdb v2.1 for 2.4.17 was released by Keith Owens.
      * Kazuyoshi Serizawa has [34]announced the release of the Linux
        Kernel State Tracer debugging tool.
      * [35]Kernel Traffic for January 21 is available.
      * A [36]2.5.2 kernel API change summary was posted by Andreas Bombe.
      * The usual set of devfs updates is available from Richard Gooch:
        [37]devfs-v199.8 (for 2.4.18-pre4), [38]devfs-v207 (for 2.5.x),
        and [39]devfsd-v1.3.22 (for user space).
      * Ingo Molnar's latest announced scheduler patch is [40]version -J4;
        this version is included in 2.5.2-dj4, but not in 2.5.3-pre3.
        There is also [41]a version of the patch for the 2.4 kernel
        available.
      * UVFS, "yet another user space filesystem kit," was [42]announced
        by Britt Park.
      * A new [43]wireless driver API patch was posted by Jean Tourrilhes.
      * Eric Raymond's current CML2 patch is [44]CML2-2.1.8.
      * [45]Version 0.3.7 of the asynchronous I/O patch is available from
        Ben LaHaise.
      * A new "linmodem" driver for Conexant HSF softmodems was
        [46]announced by Marc Boucher.
        
    Section Editor: [47]Jonathan Corbet
    January 24, 2002
    
    For other kernel news, see:
      * [48]Kernel traffic
      * [49]Kernel Newsflash
      * [50]Kernel Trap
      * [51]Linux 2.5.x Porting help
    
    Other resources:
      * [52]Kernel Source Reference
      * [53]L-K mailing list FAQ
      * [54]Linux-MM
      * [55]Linux Scalability Effort
      * [56]Kernel Newbies
      * [57]Linux Device Drivers
    
    
    
                                                   [58]Next: Distributions
    
    [59]Eklektix, Inc. Linux powered! Copyright Л 2002 [60]Eklektix, Inc.,
    all rights reserved
    Linux (R) is a registered trademark of Linus Torvalds
 
 References
 
    1. http://lwn.net/
    2. http://lwn.net/2002/0124/
    3. http://lwn.net/2002/0124/security.php3
    4. http://lwn.net/2002/0124/dists.php3
    5. http://lwn.net/2002/0124/devel.php3
    6. http://lwn.net/2002/0124/commerce.php3
    7. http://lwn.net/2002/0124/press.php3
    8. http://lwn.net/2002/0124/announce.php3
    9. http://lwn.net/2002/0124/history.php3
   10. http://lwn.net/2002/0124/letters.php3
   11. http://lwn.net/2002/0124/bigpage.php3
   12. http://lwn.net/2002/0117/kernel.php3
   13. http://lwn.net/2002/0124/a/2.5.3-pre4.php3
   14. http://lwn.net/2002/0124/a/2.5-status.php3
   15. http://www.kernelnewbies.org/status/status.html
   16. http://lwn.net/2002/0124/a/2.4.18-pre7.php3
   17. http://lwn.net/2002/0124/a/2.5.2-dj4.php3
   18. http://lwn.net/2002/0124/a/dj4-input.php3
   19. http://lwn.net/2002/0124/a/2.4.18-pre4-aa1.php3
   20. http://lwn.net/2002/0124/a/2.0.40-rc2.php3
   21. http://lwn.net/2002/0124/a/rmap-freebsd.php3
   22. http://lwn.net/2002/0124/a/rmap-12a.php3
   23. http://www.gentoo.org/
   24. http://lwn.net/2002/0124/a/athlon-agp-problem.php3
   25. http://lwn.net/2002/0124/a/vaio-fix.php3
   26. http://lwn.net/2001/0704/kernel.php3
   27. http://lwn.net/2002/0124/a/cramfs.php3
   28. http://lwn.net/2002/0124/a/accessfs.php3
   29. http://lwn.net/2002/0124/a/voyager.php3
   30. http://lwn.net/2002/0124/a/vsyscalls.php3
   31. http://lwn.net/2002/0124/a/evms.php3
   32. http://lwn.net/2002/0124/a/ibm-pci-hotplug.php3
   33. http://lwn.net/2002/0124/a/kdb.php3
   34. http://lwn.net/2002/0124/a/lkst.php3
   35. http://kt.zork.net/kernel-traffic/kt20020121_151.html
   36. http://lwn.net/2002/0124/a/api-change.php3
   37. http://lwn.net/2002/0124/a/devfs-v199.8.php3
   38. http://lwn.net/2002/0124/a/devfs-v207.php3
   39. http://lwn.net/2002/0124/a/devfsd-v1.3.22.php3
   40. http://lwn.net/2002/0124/a/scheduler.php3
   41. http://lwn.net/2002/0124/a/scheduler-2.4.php3
   42. http://lwn.net/2002/0124/a/uvfs.php3
   43. http://lwn.net/2002/0124/a/wireless-api.php3
   44. http://lwn.net/2002/0124/a/cml.php3
   45. http://lwn.net/2002/0124/a/aio.php3
   46. http://lwn.net/2002/0124/a/hsf-modem.php3
   47. mailto:lwn@lwn.net
   48. http://kt.zork.net/
   49. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
   50. http://www.kerneltrap.com/
   51. http://www.osdl.org/archive/rddunlap/linux-port-25x.html
   52. http://lksr.org/
   53. http://www.tux.org/lkml/
   54. http://www.linux.eu.org/Linux-MM/
   55. http://lse.sourceforge.net/
   56. http://www.kernelnewbies.org/
   57. http://www.xml.com/ldd/chapter/book/index.html
   58. http://lwn.net/2002/0124/dists.php3
   59. http://www.eklektix.com/
   60. http://www.eklektix.com/
 
 --- ifmail v.2.14.os7-aks1
  * Origin: Unknown (2:4615/71.10@fidonet)
 
 

Вернуться к списку тем, сортированных по: возрастание даты  уменьшение даты  тема  автор 

 Тема:    Автор:    Дата:  
 URL: http://www.lwn.net/2002/0124/kernel.php3   Sergey Lentsov   28 Jan 2002 20:54:18 
Архивное /ru.linux/1986118f24818.html, оценка 2 из 5, голосов 10
Яндекс.Метрика
Valid HTML 4.01 Transitional