|
|
ru.linux- RU.LINUX --------------------------------------------------------------------- From : Sergey Lentsov 2:4615/71.10 16 May 2002 22:36:37 To : All Subject : URL: http://www.lwn.net/2002/0516/kernel.php3 --------------------------------------------------------------------------------
[1][LWN Logo]
[LWN.net]
Sections:
[2]Main page
[3]Security
Kernel
[4]Distributions
[5]Development
[6]Commerce
[7]Linux in the news
[8]Announcements
[9]Letters
[10]All in one big page
See also: [11]last week's Kernel page.
Kernel development
The current development kernel release is 2.5.15, released on May 9.
Changes this time around include a resumption of the "device model"
work (with an emphasis on the x86 PCI code), more IDE reworking
(including the removal of /proc/ide - see [12]last week's LWN Kernel
Page), an NFS server update, many patches from the "dj" series, and
lots of other fixes and updates.
The in-progress 2.5.16 patch, as seen in BitKeeper, includes an ISDN
update, George Anziger's [13]64-bit jiffies patch, the usual IDE
patches, some networking updates, work on the new NFS export scheme,
and more.
Dave Jones's latest patch is [14]2.5.15-dj1, which contains a
relatively small set of fixes and updates.
The latest [15]2.5 status summary from Guillaume Boissiere is dated
May 15.
The current stable kernel release is 2.4.18. No 2.4.19 prepatches have
been released by Marcelo this week.
The current patch from Alan Cox is [16]2.4.19-pre8-ac4. The biggest
change here is a new set of IDE updates by Andre Hedrick that went
into -ac3. The 2.4 and 2.5 IDE subsystems continue to go in very
different directions.
On the 2.2 front, Alan has released [17]2.2.21-rc4, the latest 2.2.21
release candidate. Unless something turns up, this one will become the
real 2.2.21.
The future of in-kernel web servers. Some recent discussion on
troubles with khttpd, the in-kernel web server which has been present
since the early 2.3 days, led to the statement that khttpd would soon
be removed from the 2.5 series. khttpd has a number of happy users,
but it has been essentially unmaintained for a number of years, and it
has been superseded by Ingo Molnar's TUX server. So the kernel
developers see little reason to keep it around.
The more interesting question, perhaps, is whether TUX will take the
place of khttpd. There appears to be little consensus on whether TUX
should go in or not. Some developers are worried about the impact of
the TUX patch, while others claim it affects little other code. It is
not clear how much of a performance benefit TUX really provides - some
user-space web servers are said to be getting quite close to TUX in
speed. And, of course, a number of people feel that an application
like a web server has no place inside the Linux kernel.
Servers like TUX and khttpd remain interesting as a demonstration of
how to create the shortest, fastest path between the network and files
on a disk. Chances are that TUX will find its way into a mainline
kernel sooner or later.
Per-driver filesystems made easy. Alexander Viro has long been a
proponent of small, special-purpose filesystems as a way for device
drivers (or other kernel subsystems) to communicate with user space.
The mini filesystem approach, he says, is a far cleaner and safer
technique than the alternatives: /proc, the ioctl() call, or devfs.
This approach makes sense to a number of people, but it has not been
widely adopted. After all, if you are not Al Viro (which is the case
for most of us), hacking up a new filesystem can be a little
intimidating.
So he has been trying for a while to make the task of writing driver
filesystems easier. His [18]latest posting includes a set of library
functions which mostly concern themselves with the creation of
superblocks for virtual filesystems. The superblock is a good thing to
hide within a library layer; virtual filesystems just need something
to hand to the VFS; there should be no need for each one to duplicate
a lot of "fill in the superblock field" code.
The other half of the posting is a driver which creates a little
filesystem to export the value of a set of VIA motherboard temperature
sensors. The whole thing takes up 70 lines of code, and much of that,
of course, is dealing with getting information from the sensors. The
task of creating special purpose virtual filesystems has indeed been
made easy.
The trickier part in the long run may be on the system administration
side. If the mini filesystem approach takes off, each system will have
to be configured to mount these filesystems in the right places. /proc
files and ioctl() calls just show up in their standard places, but
filesystems must be explicitly mounted somewhere. How are VIA
motherboard users to know that they can mount a devvia filesystem
somewhere to read their temperature sensors? Add in a dozen other
hardware-specific filesystems and one begins to see that some work on
system administration tools will be needed to make it all easy to
manage.
A different approach to asynchronous I/O. It started with a discussion
of the O_DIRECT flag, which can be used to request that "direct" I/O
be performed on a file. Direct I/O moves data directly between the
userspace buffer and the device performing the I/O, without copying
through kernel space. Direct I/O can be faster, since it avoids copy
operations and because it does not fill the system's page cache with
data that will not be used again.
It was [19]noted recently that benchmarks using O_DIRECT tend to
perform worse than those using regular, cached I/O. The reason for
this poor performance is reasonably straightforward: direct I/O, as
implemented in Linux, is synchronous. The application must sleep and
wait for the operation to complete, and there is no opportunity to
reorder operations for better I/O performance. If you really want to
make O_DIRECT work well, you need to combine it with asynchronous I/O.
So, one would think, there would be a motivation to get the
asynchronous I/O patches into the 2.5 kernel. Linus, however, has
other ideas, based on [20]his opinion of O_DIRECT:
The thing that has always disturbed me about O_DIRECT is that the
whole interface is just stupid, and was probably designed by a
deranged monkey on some serious mind-controlling substances.
In other words, one might conclude that he doesn't like it.
A statement like that, of course, raises an immediate question: how,
exactly, would one design a high-performance, zero-copy, asynchronous
I/O subsystem if you can't get the monkeys to share their substances
with you? Linus's [21]answer is to split apart the two aspects of the
problem: performing the I/O and connecting the data to user space.
In this new scheme, a process wishing to do asynchronous, direct reads
from a file would, after opening that file, invoke a new system call:
readahead (file_desc, offset, size);
This call will set the kernel to populating the system's page cache
with data from the file starting at the given offset, for an amount
approximating size. At this point, the data is in (kernel) memory, and
is not visible to the userspace application. Actually getting at the
data requires calling mmap with a special MAP_UNCACHED flag.
This memory mapping is special in a couple of ways. One is that it
does not set up any page tables when the mapping is established, so it
happens very quickly. The other is that, when the user application
generates a page fault (by trying to access the data it ordered with
readahead()), the page is "stolen" from the page cache and turned into
a private page belonging to the application. Until the fault happens,
the read operation is entirely asynchronous; once the application
actually tries to use the data, it will wait if the operation still
has not completed.
If the application is, instead, looking to write data, it starts by
populating its mapped memory segment. When things are ready to go,
another new system call:
mwrite (file_desc, address, length);
is used. mwrite() puts the page back into the page cache (where it
will get written eventually) and removes it from the process's page
table. The (new) fdatasync_area() system call may be used to force
(and wait for) specific pages to be written.
A process which is simply copying data need never access the pages in
the mapping directly. In this case, no page tables ever get built, and
things go even more quickly. Pure copy cases are relatively rare,
though, especially since this scheme would not support I/O to network
connections (which do not use the page cache). The high-profile
application for this sort of I/O (or O_DIRECT) is Oracle, which
performs lots of I/O out of large segments.
So far, all this is just a scheme sketched out by Linus, with no
implementation to play with. Should some ambitious kernel hacker code
it up, however, it would be interesting to see how it really performs
relative to other techniques.
Corrections on the buffer head work. Andrew Morton politely pointed
out that your editor was more confused than usual when writing about
Andrew's buffer head work last week. The bulk of that work actually
affected the way the write() system call was handled. In the old
scheme, data to be written back to files would find its way into the
buffer head least-recently-used queue, where it would eventually be
flushed to disk. With the new code, this data is written directly from
the page cache, in a more page-oriented mode.
Buffer heads are still used to coordinate the I/O process, for now. As
a result of all the block layer work that has gone in, the block
system now takes those buffer heads and digs down to the real pages
underneath them. So, at some point, an obvious step will be to remove
the buffer head "middleman," and submit pages to be written directly
to the block layer. So, eventually, buffer heads will no longer be the
main I/O mechanism for block I/O.
Sorry for the confusion.
Other patches and updates released this week include:
Kernel trees:
* Martin Loschwitz: [22]2.5.15-ml2; looks like 2.5.15 plus recent,
mainstream patches.
* Joerg Prante: [23]2.4.19-pre8-jp12; ALSA, JFS, XFS, RMAP,
preemptible kernel, FreeS/WAN, etc.
* J.A. Magallon: [24]2.4.19-pre8-jam2.
* Andrea Arcangeli: [25]2.4.19-pre8-aa3.
Core kernel code:
* Rik van Riel: [26]I/O wait statistics.
* Rusty Russell: [27]Futex update.
* Hugh Dickens: [28]noht boot option to disable hyperthreading.
* Patricia Gaughen: [29]discontiguous memory support for ia32 NUMA
systems.
* Hanna Linder: [30]fast walk dcache for 2.4.19-pre8.
* Rusty Russell: hotplug CPU preparation, mostly dealing with the
management of idle tasks on new CPUs ([31]I, [32]II, [33]III,
[34]IV, and [35]V)
Device drivers
* Martin Dalecki: IDE reworking: ( [36]59, [37]60, [38]61, [39]62a,
(Linus [40]didn't like 62), [41]63, [42]64
* Bakonyi Ferenc: [43]RivaTV driver 0.8.0.
* Denis Oliver Kropp: [44]VMWare framebuffer driver, version 0.5.2.
* Richard Gooch: [45]devfs v199.14 for 2.4.19-pre8 and [46]version
213 for 2.5.15. .
* Johannes Erdfelt: [47]rework USB device reference counting.
* Greg Kroah-Hartman: [48]further rework USB reference counting.
* Neil Brown: make RAID 5 work in 2.5 ([49]1, [50]2, and [51]3)
Filesystems:
* Anton Altaparmakov: [52]NTFS 2.0.7.
* Pawel Kot: [53]backport of NTFS 2.0.7 for 2.4.18.
* Jan Harkes: new iget_locked() function for inode creation ([54]1,
[55]2, [56]3, [57]4, [58]5, and [59]6)
* Peter Chubb: [60]remove 2TB filesystem size limit.
* Hirotaka Sasaki: [61]alternative patch to remove the 2TB limit.
Kernel building:
* Keith Owens: [62]kbuild 2.5 core-14. Keith has also posted
[63]another note stating that kbuild is ready for inclusion.
* Andi Kleen: add a [64]CONFIG_ISA option.
Miscellaneous:
* Denis Vlasenko: [65]kernel maintainers file.
* Karim Yaghmour: [66]Linux Trace Toolkit for 2.5.15.
* Neil Brown: [67]mdadm tool 1.0.0 for the management of RAID sets.
* Greg Kroah-Hartman: [68]pcihpview 0.3, a GUI tools for PCI hotplug
management.
* Patricia Gaughen: [69]updated NUMA status page.
* Jari Ruusu: [70]loop-AES 1.6c file and swap crypto package.
* [71]Kernel Traffic #166 is available.
Ports:
* James Bottomley: [72]NCR Voyager port.
* James Bottomley: [73]split up i386 code into subarchitectures.
* Robert Love: [74]preemptible kernel for MIPS processors for
2.4.19-pre8.
Section Editor: [75]Jonathan Corbet
May 16, 2002
[76]Work on the world's most powerful Linux computer
Kernel programmers: Come work on the world's most powerful Linux
supercomputer. Pacific Northwest National Laboratory (operated by
Battelle for the U.S. Department of Energy) is interested in hard
working people to join our team. Everything produced will remain GPL.
[77]Interested candidates can apply online immediately.
[78]Learn more about this computer.
PNNL is an EEO/AA employer and values diversity in the workplace.
F/M/D/V are encouraged to apply.
For other kernel news, see:
* [79]Kernel traffic
* [80]Kernel Newsflash
* [81]Kernel Trap
* [82]2.5 Status
Other resources:
* [83]L-K mailing list FAQ
* [84]Linux-MM
* [85]Linux Scalability Effort
* [86]Kernel Newbies
* [87]Linux Device Drivers
[88]Next: Distributions
[89]Eklektix, Inc. Linux powered! Copyright Л 2002 [90]Eklektix, Inc.,
all rights reserved
Linux (R) is a registered trademark of Linus Torvalds
References
1. http://lwn.net/
2. http://lwn.net/2002/0516/
3. http://lwn.net/2002/0516/security.php3
4. http://lwn.net/2002/0516/dists.php3
5. http://lwn.net/2002/0516/devel.php3
6. http://lwn.net/2002/0516/commerce.php3
7. http://lwn.net/2002/0516/press.php3
8. http://lwn.net/2002/0516/announce.php3
9. http://lwn.net/2002/0516/letters.php3
10. http://lwn.net/2002/0516/bigpage.php3
11. http://lwn.net/2002/0509/kernel.php3
12. http://lwn.net/2002/0509/kernel.php3
13. http://lwn.net/2002/0516/a/64-bit-jiffies.php3
14. http://lwn.net/2002/0516/a/2.5.15-dj1.php3
15. http://lwn.net/2002/0516/a/2.5-status.php3
16. http://lwn.net/2002/0516/a/2.4.19-pre8-ac4.php3
17. http://lwn.net/2002/0516/a/2.2.21-rc4.php3
18. http://lwn.net/2002/0516/a/driverfs-made-easy.php3
19. http://lwn.net/2002/0516/a/O_DIRECT-performance.php3
20. http://lwn.net/2002/0516/a/lt-deranged-monkey.php3
21. http://lwn.net/2002/0516/a/lt-async.php3
22. http://lwn.net/2002/0516/a/2.5.15-ml2.php3
23. http://lwn.net/2002/0516/a/2.4.19-pre8-jp12.php3
24. http://lwn.net/2002/0516/a/2.4.19-pre8-jam2.php3
25. http://lwn.net/2002/0516/a/2.4.19-pre8-aa3.php3
26. http://lwn.net/2002/0516/a/iowait-stats.php3
27. http://lwn.net/2002/0516/a/futex.php3
28. http://lwn.net/2002/0516/a/noht.php3
29. http://lwn.net/2002/0516/a/discontig.php3
30. http://lwn.net/2002/0516/a/fastwalk.php3
31. http://lwn.net/2002/0516/a/hotplug-cpu-1.php3
32. http://lwn.net/2002/0516/a/hotplug-cpu-2.php3
33. http://lwn.net/2002/0516/a/hotplug-cpu-3.php3
34. http://lwn.net/2002/0516/a/hotplug-cpu-4.php3
35. http://lwn.net/2002/0516/a/hotplug-cpu-5.php3
36. http://lwn.net/2002/0516/a/ide-59.php3
37. http://lwn.net/2002/0516/a/ide-60.php3
38. http://lwn.net/2002/0516/a/ide-61.php3
39. http://lwn.net/2002/0516/a/ide-62a.php3
40. http://lwn.net/2002/0516/a/lt-ide-62.php3
41. http://lwn.net/2002/0516/a/ide-63.php3
42. http://lwn.net/2002/0516/a/ide-64.php3
43. http://lwn.net/2002/0516/a/rivatv.php3
44. http://lwn.net/2002/0516/a/vmwarefb.php3
45. http://lwn.net/2002/0516/a/devfs-v199.14.php3
46. http://lwn.net/2002/0516/a/devfs-v213.php3
47. http://lwn.net/2002/0516/a/usb-refcount.php3
48. http://lwn.net/2002/0516/a/usb-refcount-2.php3
49. http://lwn.net/2002/0516/a/raid-1.php3
50. http://lwn.net/2002/0516/a/raid-2.php3
51. http://lwn.net/2002/0516/a/raid-3.php3
52. http://lwn.net/2002/0516/a/ntfs.php3
53. http://lwn.net/2002/0516/a/ntfs-207a.php3
54. http://lwn.net/2002/0516/a/iget-1.php3
55. http://lwn.net/2002/0516/a/iget-2.php3
56. http://lwn.net/2002/0516/a/iget-3.php3
57. http://lwn.net/2002/0516/a/iget-4.php3
58. http://lwn.net/2002/0516/a/iget-5.php3
59. http://lwn.net/2002/0516/a/iget-6.php3
60. http://lwn.net/2002/0516/a/2tb.php3
61. http://lwn.net/2002/0516/a/2tb-2.php3
62. http://lwn.net/2002/0516/a/kbuild.php3
63. http://lwn.net/2002/0516/a/kbuild-ready.php3
64. http://lwn.net/2002/0516/a/config-isa.php3
65. http://lwn.net/2002/0516/a/maintainers.php3
66. http://lwn.net/2002/0516/a/ltt.php3
67. http://lwn.net/2002/0516/a/mdadm.php3
68. http://lwn.net/2002/0516/a/pcihpview.php3
69. http://lse.sf.net/numa/numastatus.html
70. http://lwn.net/2002/0516/a/loop-aes.php3
71. http://kt.zork.net/kernel-traffic/kt20020513_166.html
72. http://lwn.net/2002/0516/a/voyager.php3
73. http://lwn.net/2002/0516/a/i386-split.php3
74. http://lwn.net/2002/0516/a/preempt-mips.php3
75. mailto:lwn@lwn.net
76.
http://oasis.lwn.net/oasisc.php?s=5&c=30&cb=1981253904&url=http%3A%2F%2Fjobs.pnl
.gov%2Fasp%2FReqDescr%2FReqDescr.asp%3Fv_ReqNbr%3D103909%26company%3DPNL
77.
http://oasis.lwn.net/oasisc.php?s=5&c=30&cb=1981253904&url=http%3A%2F%2Fjobs.pnl
.gov%2Fasp%2FReqDescr%2FReqDescr.asp%3Fv_ReqNbr%3D103909%26company%3DPNL
78.
http://oasis.lwn.net/oasisc.php?s=5&c=30&cb=1981253904&url=http%3A%2F%2Fwww.pnl.
gov%2Fnews%2F2002%2Fcomputer.htm
79. http://kt.zork.net/
80. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
81. http://www.kerneltrap.com/
82. http://kernelnewbies.org/status/
83. http://www.tux.org/lkml/
84. http://linux-mm.org/
85. http://lse.sourceforge.net/
86. http://www.kernelnewbies.org/
87. http://www.xml.com/ldd/chapter/book/index.html
88. http://lwn.net/2002/0516/dists.php3
89. http://www.eklektix.com/
90. http://www.eklektix.com/
--- ifmail v.2.14.os7-aks1
* Origin: Unknown (2:4615/71.10@fidonet)
Вернуться к списку тем, сортированных по: возрастание даты уменьшение даты тема автор
Архивное /ru.linux/19861406b6f95.html, оценка из 5, голосов 10
|