|
|
ru.linux- RU.LINUX --------------------------------------------------------------------- From : Sergey Lentsov 2:4615/71.10 08 Feb 2001 18:31:34 To : All Subject : URL: http://lwn.net/2001/0208/kernel.php3 --------------------------------------------------------------------------------
[1][LWN Logo]
[2]Click Here
[LWN.net]
Sections:
[3]Main page
[4]Security
Kernel
[5]Distributions
[6]Development
[7]Commerce
[8]Linux in the news
[9]Announcements
[10]Linux History
[11]Letters
[12]All in one big page
See also: [13]last week's Kernel page.
Kernel development
The current development kernel release is still 2.4.1. The two usual
prepatch tracks are in full swing. On the Linus side, there is
[14]2.4.2pre1, released just after LinuxWorld. It contains a small set
of fixes, and doesn't yet deal with the known 2.4.2 problems (see
below). Alan Cox, instead, has released [15]2.4.1ac5, which contains a
much larger set of fixes.
On the 2.2 kernel front Alan has released [16]2.2.19pre8. There are
still, apparently, a few things yet to go into this patch, so the real
2.2.19 release is not yet imminent.
Some difficulties with 2.4.1. While many (most) users are running
2.4.1 without trouble, there are a couple of issues that have come up
which are worth knowing about. They are:
* There is a bug in the handling of Unix datagram sockets which
locks up the kernel - or at least one processor on SMP systems.
Chris Evans has posted [17]a simple test program which
demonstrates the bug - don't run it on your big server. [18]A
patch for this bug exists, and will certainly be merged into the
next kernel prepatches.
* Hans Reiser has posted [19]a message on stability problems with
ReiserFS. There are currently [20]five outstanding bugs with this
filesystem, not all of which yet have fixes available (one of them
looks like hardware problems, rather than a real ReiserFS bug).
Neither of these issues is all that surprising. Every major stable
kernel release seems to have one denial of service bug lurking
somewhere; it takes a larger testing community to flush it out.
Similarly, ReiserFS is now seeing testing on a far larger scale than
it ever has in the past, and a few surprises are certain to show up.
This is the late stage of the free software development process in
action; fixes are being made quickly, and the end result will be a
more stable kernel.
ReiserFS can also cause system crashes, but this is not a ReiserFS
bug. It seems that some people are building the 2.4.x kernel with Red
Hat's "gcc-2.96" compiler that was shipped with Red Hat 7. That
compiler has some, um, issues, and it miscompiles some of the ReiserFS
code. If you're running a late Red Hat system, be sure to build your
kernels with "kgcc," or at least get the latest, patched gcc from Red
Hat (which is said to work much better).
The great kiobuf debate. Recently, a farily fierce debate has been
filling up mailboxes on the linux-kernel and kiobuf-io-devel mailing
lists. It all has to do with the kiobuf data structure, which was,
until recently, seen as a generally good addition to the kernel in the
2.3 series.
The kiobuf structure was added, initially, to support raw disk I/O;
kiobufs and their supporting routines make it easy for kernel code to
move data directly between user space and a device, without an
intervening copy into kernel space, and without having to worry about
the ugly details of memory management. Their use has slowly grown; in
the 2.4.1 kernel kiobufs can be found in the generic SCSI (sg) driver
and in the logical volume manager code. There is also [21]a patch
floating around that uses kiobufs to implement direct, user-to-user
pipes. And SGI's [22]XFS patch not only uses kiobufs, but modifies the
block I/O subsystem to make them integral to disk I/O.
One would think that kiobufs were taking over, except for the little
fact that the zero-copy networking patches do not use them. Instead, a
new and completely different mechanism for direct userspace access was
created. In the discussion that followed, it turned out that quite a
few people, including Linus, are not pleased with the kiobuf design.
In a (very) simplified way, that design is as follows: a kiobuf, in
the end, consists of an array of struct page structures, along with an
initial offset and a total length value. By using page structures
directly, the kiobuf allows the code using it to avoid dealing with
the virtual memory entirely - a struct page refers directly to a
physical page. The initial offset tells where, in the first page, the
data starts; all the remaining pages are filled with data starting at
the beginning. A kiobuf thus describes a single, contiguous area;
working with multiple areas requires using a "kiovec" - an array of
kiobufs - instead.
The objections to this design include:
* It is said to be a very heavyweight structure. Kiobufs are a bit
large, mostly due to the incorporation of an array for the page
structures. Ingo Molnar has [23]characterized kiobufs as "big fat
monster-trucks of IO workload."
* Kiobufs do not handle scatter/gather operations (those which work
from multiple, noncontiguous memory areas) very gracefully; such
an operation requires setting up a kiovec and using several
kiobufs which, as previously noted, are already criticized as
being too large. Networking, in particular, makes heavy use of
scatter/gather I/O, and needs to be able to set up and tear down
structures very quickly.
* One of the reasons that kiobufs are difficult for scatter/gather
operations is that they assume that all data is aligned on page
boundaries, with the exception of the first page. That tends to be
true for disk I/O, but is rarely the case for networking. Linus,
in particular, [24]doesn't want any page alignment assumptions in
this sort of code.
In the end, the fight seems to boil down to this: should a kiobuf
include an array of offset/length pairs for each page within the
buffer? With such an array, scatter/gather operations could be
described with a single kiobuf, and the kiovec idea could go away.
Linus, certainly, [25]takes the position that the offset and length
values should be pushed down deep in the structure in this way. Kiobuf
designer Stephen Tweedie, however [26]disagrees. Putting the length
and offset at that level would make it hard to get the completion
status of any individual segment and would tend to split apart large
requests which should really stay together.
The discussion then wandered into whether the venerable buffer head
structure could be made to do what kiobufs do. A number of people seem
to think that they could, especially if the block I/O API were
modified to make it easy to submit large chains of them as a single
operation. But no code for this use of buffer heads has, as yet, been
forthcoming.
This issue, clearly, goes pretty deeply into how fundamental
operations are performed in the kernel. For this reason, the design
issues involved seem to touch a number of nerves. It will probably be
some time before a real resolution is reached; those who are
programming with kiobufs, however, should be prepared to see the
interface change...
The first public Linux-NTFS release is out, see [27]the announcement
for details. This release makes it possible to mount NT filesystems in
a writable mode under Linux. It's not yet perfect, however; when it
writes to an NTFS partition it leaves a bit of damage behind. For the
short term, it was evidently easier to provide a separate utility
("ntfsfix") which fixes things up afterwards.
Other patches and updates released this week include:
* David Miller continues to put out [28]frequent zero-copy
networking patches; this patch also, currently, contains the fix
to the Unix datagram bug.
* Jeff Merkey has released [29]version v1.1-7 of his driver for
Dolphin Scalable Coherent Interface adapters.
* [30]A new kernel development mailing list has been created by Ingo
Oeser; it is intended to host discussion of a wide range of
operating system techniques, not just those in use in the Linux
kernel.
* [31]devfs-v99.19 was posted by Richard Gooch; it is a backport of
the latest devfs code to the 2.2.18 kernel. He has also posted
[32]devfsd-v1.3.11, the devfs daemon that is needed to use a
devfs-enabled kernel.
* Rusty Russell has released [33]code to generate a graph of the
2.4.0 kernel. It requires several hours to run, and, on some
systems, has proven a little difficult to generate.
* Juergen Schneider has posted [34]a patch which adds an animated
boot logo to the framebuffer driver.
* Robert H. de Vries has posted [35]a new version of his POSIX
timers patch. This time around, Linus [36]responded that he'll not
be applying the patch anytime soon, since he does not like the
implementation.
* The USAGI Project (USAGI = "UniverSAl playGround for Ipv6") has
[37]announced the second stable release of its system, which
features support for both the 2.2.18 and 2.4.0 kernels.
Section Editor: [38]Jonathan Corbet
February 8, 2001
For other kernel news, see:
* [39]Kernelnotes
* [40]Kernel traffic
* [41]Kernel Newsflash
* [42]Kernel Trap
Other resources:
* [43]Kernel Source Reference
* [44]L-K mailing list FAQ
* [45]Linux-MM
* [46]Linux Scalability Project
[47]Next: Distributions
[48]Eklektix, Inc. Linux powered! Copyright Л 2001 [49]Eklektix, Inc.,
all rights reserved
Linux (R) is a registered trademark of Linus Torvalds
References
1. http://lwn.net/
2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
3. http://lwn.net/2001/0208/
4. http://lwn.net/2001/0208/security.php3
5. http://lwn.net/2001/0208/dists.php3
6. http://lwn.net/2001/0208/devel.php3
7. http://lwn.net/2001/0208/commerce.php3
8. http://lwn.net/2001/0208/press.php3
9. http://lwn.net/2001/0208/announce.php3
10. http://lwn.net/2001/0208/history.php3
11. http://lwn.net/2001/0208/letters.php3
12. http://lwn.net/2001/0208/bigpage.php3
13. http://lwn.net/2001/0201/kernel.php3
14. http://lwn.net/2001/0208/a/2.4.2pre1.php3
15. http://lwn.net/2001/0208/a/2.4.1ac5.php3
16. http://lwn.net/2001/0208/a/2.2.19pre8.php3
17. http://lwn.net/2001/0208/a/unix-datagram-bug.php3
18. http://lwn.net/2001/0208/a/unix-datagram-fix.php3
19. http://lwn.net/2001/0208/a/reiserfs-stability.php3
20. http://lwn.net/2001/0208/a/reiserfs-bugs.php3
21. http://lwn.net/2001/0208/a/kiobuf-pipe.php3
22. http://oss.sgi.com/projects/xfs/
23. http://lwn.net/2001/0208/a/im-kiobuf.php3
24. http://lwn.net/2001/0208/a/lt-alignment.php3
25. http://lwn.net/2001/0208/a/lt-layering.php3
26. http://lwn.net/2001/0208/a/st-layering.php3
27. http://lwn.net/2001/0208/a/linux-ntfs.php3
28. http://lwn.net/2001/0208/a/zerocopy.php3
29. http://lwn.net/2001/0208/a/pci-sci.php3
30. http://lwn.net/2001/0208/a/os-devel.php3
31. http://lwn.net/2001/0208/a/devfs-v99.19.php3
32. http://lwn.net/2001/0208/a/devfsd-v1.3.11.php3
33. http://lwn.net/2001/0208/a/kernel-graph.php3
34. http://lwn.net/2001/0208/a/fb-logo.php3
35. http://lwn.net/2001/0208/a/posix-timers.php3
36. http://lwn.net/2001/0208/a/lt-timers.php3
37. http://lwn.net/2001/0208/a/usagi.php3
38. mailto:lwn@lwn.net
39. http://www.kernelnotes.org/
40. http://kt.linuxcare.com/
41. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
42. http://www.kerneltrap.com/
43. http://lksr.org/
44. http://www.tux.org/lkml/
45. http://www.linux.eu.org/Linux-MM/
46. http://www.citi.umich.edu/projects/linux-scalability/
47. http://lwn.net/2001/0208/dists.php3
48. http://www.eklektix.com/
49. http://www.eklektix.com/
--- ifmail v.2.14.os7-aks1
* Origin: Unknown (2:4615/71.10@fidonet)
Вернуться к списку тем, сортированных по: возрастание даты уменьшение даты тема автор
Архивное /ru.linux/20308204fcd92.html, оценка из 5, голосов 10
|