|
|
ru.linux- RU.LINUX --------------------------------------------------------------------- From : Sergey Lentsov 2:4615/71.10 26 Jan 2001 11:46:03 To : All Subject : URL: http://lwn.net/2001/0125/kernel.php3 --------------------------------------------------------------------------------
[1][LWN Logo]
[2]Click Here
[LWN.net]
Sections:
[3]Main page
[4]Security
Kernel
[5]Distributions
[6]Development
[7]Commerce
[8]Linux in the news
[9]Announcements
[10]Linux History
[11]Letters
[12]All in one big page
See also: [13]last week's Kernel page.
Kernel development
The current kernel release is still 2.4.0. Linus continues to put
together a 2.4.1 prepatch, currently at [14]2.4.1-pre10. His approach
remains conservative, and this patch (especially if you ignore
ReiserFS) is relatively small.
Those looking for something meatier may want to consider, instead,
[15]2.4.0-ac11 from Alan Cox. This release contains literally hundreds
of patches - almost 10MB worth.
Cutting out the middleman in data transfers. The discussion started by
David Miller's posting of an experimental zero-copy networking
implementation (discussed on this page [16]two weeks ago) continues,
though it has moved into new areas. One of those is the optimization
of data transfers to avoid copying the data as much as possible.
Consider, for example, the sendfile() interface that Linux supports
now; using sendfile(), an application (a web server, say) can transfer
a disk file to a network socket without ever having to read it into
user space. There is an obvious performance gain from operating in
this mode for certain applications.
So, why not extend the idea to its logical conclusion? Why not have a
system call that says "copy data from here to there, and optimize as
much as possible"? One approach to this mode is [17]Larry McVoy's
'splice' interface, which tries to provide a general way for user
space processes to control high-performance copies. It provides "push"
and "pull" primitives which handle the destination and source sides of
a copy, respectively, and give the application some latitude in how
the two are put together.
Here's [18]Linus's comments on splice and why it has not been
implemented so far. Essentially, sendfile handled the task that most
users wanted, the splice interface needed a bit of work, and it didn't
fit well into the structure of the kernel at the time. The kernel has
since evolved, and Linus's message hints that an implementation of a
modified form of splice would be easier now, and that it might even be
accepted.
One can take the idea further, however: why not, when appropriate,
simply tell the hardware to copy the data between devices directly and
leave the kernel (and the processor) out of it altogether? According
to Linus, that's one of those great ideas that turns out not to be so
great in practice. His [19]short response to the idea was:
device-to-device copies sound like the ultimate thing.
They suck. They add a lot of complexity and do not work in general.
And, if your "normal" usage pattern really is to just move the data
without even looking at it, then you have to ask yourself whether
you're doing something worthwhile in the first place.
Further into the discussion, Linus came up with other reasons to avoid
direct device-to-device (D2D?) copies. One is that [20]there is very
little use for the capability in the end. One can talk, for example,
of streaming video directly to disk - but how often will a user be
recording video without wanting to look at it too? Another is that
[21]very little hardware supports that mode of operation. Linus sees a
trend toward connecting hardware with direct, point-to-point links
that are not amenable to direct operations between devices. Quoth
Linus: "Just wait. My crystal ball is infallible."
TCP_CORK or MSG_MORE? Another branch of the same discussion has to do
with getting optimal performance from network transfers. Imagine a web
server using the sendfile() interface described above. In response to
a request for a page, the server will first write out a short set of
HTTP headers, then use sendfile() to actually transfer the page data.
By the time the sendfile() call is actually made, however, the headers
will have gone out on the net as a very short packet. The result is
poor performance on both the sending and receiving side.
Linux has handled this issue with a TCP option called TCP_CORK. If an
application sets that option on a socket, the kernel will not send out
short packets. Instead, it will wait until enough data has shown up to
fill a maximum-size packet, then send it. When TCP_CORK is turned off,
any remaining data will go out on the wire.
TCP_CORK does the job reasonably well. Recently, however, a contingent
led by Ingo Molnar has been [22]pushing for a new interface which uses
a flag called MSG_MORE. Rather than applying to the socket in general,
MSG_MORE is attached to a one or more write operations on that socket.
It says "there will be more data coming," and the kernel knows to
buffer data to get bigger packets. The advantages of this approach are
said to be (1) it requires no persistent state on the socket, thus
helping, among other things, to avoid programming errors; and (2) it
avoids the system call overhead of toggling the TCP_CORK flag. Ingo
used MSG_MORE in the implementation of the TUX kernel web server, and
is happy with the results.
Linus, however, is not convinced. MSG_MORE requires a flag to be set
on every transfer, only works on sockets, and requires that the code
that is doing the writing be aware of the flag. TCP_CORK, instead,
works with programs using the standard I/O package, and it can be set
on sockets that are passed to other applications, such as CGI scripts,
that are completely unaware of its presence. The TCP_CORK flag
preserves a lot more of the standard Unix stream semantics.
Conclusion: don't expect to see MSG_MORE show up in user space anytime
soon.
Fixing the 2.4.0 USB breakage. When 2.4.0 came out, it included a
last-minute change to the usb_device_id structure, which is used to
find driver modules for specific USB devices. Unfortunately, the form
of this change was such that it broke the USB autoloading mechanism
entirely. Since then, the USB maintainers, along with modutils
maintainer Keith Owens, have been trying to figure out a way to make
things work again.
The problem is that modutils, which handles the actual module loading
process, can not distinguish the new usb_device_id structure from the
old one. Making modutils work with the 2.4.0 version of the structure
is not a problem - but then it will cease to work for earlier
versions. Keith Owens places great importance on backward
compatibility, and does not want to break things for any version. So
he has produced [23]a kernel patch which adds a version number to the
relevant structures. With versioning, changes can be detected and
everything can be made to work.
Linus, however, [24]does not want to apply the patch. It is, after
all, a binary interface change; such changes are generally avoided
within a stable kernel series. Besides, the only other kernels which
used the USB device table were the 2.4.0-test kernels - that structure
was added in 2.4.0-test10. Nobody feels all that bad about breaking
the prerelease kernels, in the end.
Almost nobody, that is; Mr. Owens is still not entirely happy. He has
released [25]modutils-2.4.2 which makes the 2.4.0 format work, but he
has done so "under protest." People who want to be able to switch
between 2.4.0 and the 2.4.0-test kernels will have to keep two
versions of modutils around; everybody else can just install 2.4.2 and
USB autoloading will work again.
Should the kbuild list move to SourceForge? Michael Elizabeth Chastain
has posted [26]a proposal to move the kbuild mailing list (which
discusses the kernel configuration and building system) to a
SourceForge project. He has a few reasons, but any kbuild reader will
know the first one intuitively: spam routinely exceeds real postings
on that list. With luck, moving to a site with better spam filtering
would help to make the list usable again.
The one objection to the move came in the form of [27]this posting,
which raised the concern that the free software world is becoming too
dependent on SourceForge.
But it just concerns me when a single company has the ability to
(temporarily) freeze the development of half the world's
open-source software just by unplugging a roomful of servers,
either voluntarily or not (think "court order").
This is a concern that LWN has raised in the past as well. This time,
however, there was a semi-official response in the form of [28]this
message from Eric Raymond, who is on the VA Linux board of directors.
According to Eric:
We're not blind to this problem. We don't want to be a chokepoint;
it's in VA's interest for the community to know it's protected
against accident or malfeasance. This is why we're developing a
network of active mirror sites -- not just to improve performance,
but so one of them could take the baton if the SourceForge primary
site had to shut down for some reason.
It is good to see an acknowledgement of this concern from VA.
SourceForge is a great resource, but it has led to an unprecedented
concentration of free software projects in a single place.
Other patches and updates released this week include:
* Neil Brown has released [29]a RAID5 patch which should fix the
filesystem corruption problems that people have been reporting.
* Douglas Gilbert's [30]The Linux SCSI subsystem in 2.4 HOWTO has
been accepted by the Linux Documentation Project. It describes the
SCSI system from a user's perspective, with much useful
information on SCSI configuration and operation.
* [31]Dynamic Probes 1.3 has been released by Suparna Bhattacharya
at IBM.
* Heinz Mauelshagen has [32]released version 0.9.1beta2 of the
Logical Volume Manager subsystem.
* [33]A new multi-queue scheduling patch has been released by Mike
Kravetz. It includes a set of benchmark results that would appear
to indicate much improved performance when dealing with large
numbers of processes.
* Robert de Vries has posted [34]a version of the POSIX timers patch
for the 2.4.0 kernel.
* [35]Version 1.8 of the x86 performance monitoring counters driver
has been released by Mikael Pettersson.
* Rusty Russell posted [36]a patch fixing some 2.4.0 netfilter bugs.
* A.M. Kuchling has written up [37]a look at Linux kernel
development, comparing it (unfavorably) with how Python
development is handled.
* Greg K-H has released [38]a new version of the hotplug scripts
package.
* If you're looking for a kernel hacking task to jump into, there's
a whole set waiting on the new [39]netfilter TODO list, maintained
by Harald Welte.
* David Miller has released [40]an updated version of his zero-copy
networking patch.
* Sam Watters has posted [41]the 'PAGG and Job module' for the 2.4.0
kernel. It is a job-level accounting system which has been
developed by Los Alamos National Laboratory and SGI. Thos module
works with the [42]Comprehensive System Accounting package, also
just released.
Section Editor: [43]Jonathan Corbet
January 25, 2001
For other kernel news, see:
* [44]Kernelnotes
* [45]Kernel traffic
* [46]Kernel Newsflash
* [47]Kernel Trap
Other resources:
* [48]Kernel Source Reference
* [49]L-K mailing list FAQ
* [50]Linux-MM
* [51]Linux Scalability Project
[52]Next: Distributions
[53]Eklektix, Inc. Linux powered! Copyright Щ 2001 [54]Eklektix, Inc.,
all rights reserved
Linux Ю is a registered trademark of Linus Torvalds
References
1. http://lwn.net/
2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
3. http://lwn.net/2001/0125/
4. http://lwn.net/2001/0125/security.php3
5. http://lwn.net/2001/0125/dists.php3
6. http://lwn.net/2001/0125/devel.php3
7. http://lwn.net/2001/0125/commerce.php3
8. http://lwn.net/2001/0125/press.php3
9. http://lwn.net/2001/0125/announce.php3
10. http://lwn.net/2001/0125/history.php3
11. http://lwn.net/2001/0125/letters.php3
12. http://lwn.net/2001/0125/bigpage.php3
13. http://lwn.net/2001/0118/kernel.php3
14. http://lwn.net/2001/0125/a/2.4.1-pre10.php3
15. http://lwn.net/2001/0125/a/2.4.0-ac11.php3
16. http://lwn.net/2001/0111/kernel.php3
17. http://lwn.net/2001/0125/a/splice.php3
18. http://lwn.net/2001/0125/a/lt-splice.php3
19. http://lwn.net/2001/0125/a/lt-devcopy.php3
20. http://lwn.net/2001/0125/a/lt-devcopy2.php3
21. http://lwn.net/2001/0125/a/lt-devcopy3.php3
22. http://lwn.net/2001/0125/a/msg_more.php3
23. http://lwn.net/2001/0125/a/ko-usb-patch.php3
24. http://lwn.net/2001/0125/a/lt-usb-patch.php3
25. http://lwn.net/2001/0125/a/modutils-2.4.2.php3
26. http://lwn.net/2001/0125/a/kbuild-move.php3
27. http://lwn.net/2001/0125/a/kbuild-worries.php3
28. http://lwn.net/2001/0125/a/esr-sourceforge.php3
29. http://lwn.net/2001/0125/a/raid5-patch.php3
30. http://linuxdoc.org/HOWTO/SCSI-2.4-HOWTO/index.html
31. http://lwn.net/2001/0125/a/dynamic-probes.php3
32. http://lwn.net/2001/0125/a/lvm.php3
33. http://lwn.net/2001/0125/a/mqs.php3
34. http://lwn.net/2001/0125/a/posix-timers.php3
35. http://lwn.net/2001/0125/a/pmcd.php3
36. http://lwn.net/2001/0125/a/netfilter-patch.php3
37. http://www.amk.ca/writing/linux-devel.html
38. http://lwn.net/2001/0125/a/hotplug-scripts.php3
39. http://lwn.net/2001/0125/a/netfilter-todo.php3
40. http://lwn.net/2001/0125/a/zero-copy.php3
41. http://lwn.net/2001/0125/a/pagg.php3
42. http://lwn.net/2001/0125/a/csa.php3
43. mailto:lwn@lwn.net
44. http://www.kernelnotes.org/
45. http://kt.linuxcare.com/
46. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
47. http://www.kerneltrap.com/
48. http://lksr.org/
49. http://www.tux.org/lkml/
50. http://www.linux.eu.org/Linux-MM/
51. http://www.citi.umich.edu/projects/linux-scalability/
52. http://lwn.net/2001/0125/dists.php3
53. http://www.eklektix.com/
54. http://www.eklektix.com/
--- ifmail v.2.14.os7-aks1
* Origin: Unknown (2:4615/71.10@fidonet)
Вернуться к списку тем, сортированных по: возрастание даты уменьшение даты тема автор
Архивное /ru.linux/126669b2896d3.html, оценка из 5, голосов 10
|