|
|
ru.linux- RU.LINUX --------------------------------------------------------------------- From : Sergey Lentsov 2:4615/71.10 06 Dec 2001 17:11:21 To : All Subject : URL: http://www.lwn.net/2001/1206/kernel.php3 --------------------------------------------------------------------------------
[1][LWN Logo]
[2]Click Here
[LWN.net]
Sections:
[3]Main page
[4]Security
Kernel
[5]Distributions
[6]Development
[7]Commerce
[8]Linux in the news
[9]Announcements
[10]Linux History
[11]Letters
[12]All in one big page
See also: [13]last week's Kernel page.
Kernel development
The current development kernel release is still 2.5.0. Linus's current
prepatch is [14]2.5.1-pre5. With recent prepatches, life has gotten
interesting; we have a true development kernel once again. Things that
have gone into 2.5.1 so far include:
* The new driver model implemented by Patrick Mochel. This code
implements a system-wide tree of all devices which will be helpful
for system configuration and power management tasks; it was
covered in the [15]October 25 LWN kernel page.
* The beginnings of the block layer thrash-up (see below).
* Richard Gooch's new devfs core code. The end result of this work
should be a more stable devfs, but it's giving some people
difficulties at the moment; approach with care.
In general, it pays to be careful with the 2.5.1 prepatches. Some of
the changes are truly disruptive, and a bit of instability is to be
expected for a while yet.
The current stable kernel release is 2.4.16. Marcelo ("[16]the wonder
penguin") has released [17]2.4.17-pre4, which contains a relatively
lengthy list of fixes and updates. Here, too, the new devfs code is
causing difficulties for some users.
On the 'design' of Linux. For those who haven't yet seen it elsewhere,
here's Linus's [18]'Linux wasn't designed' message that was widely
circulated. In another message, Linus [19]talked further on how he
thinks software gets built:
It's "directed mutation" on a microscopic level, but there is very
little macroscopic direction. There are lots of individuals with
some generic feeling about where they want to take the system (and
I'm obviously one of them), but in the end we're all a bunch of
people with not very good vision.
And that is GOOD.
It does seem that quite a bit of progress can be made, even with poor
vision.
Ripping up the block layer. It has been long understood that the 2.5
development series would include major changes to the block (disk) I/O
layer. The block code has no end of performance problems, especially
on high-end systems; it's also quite ugly in a number of places. So,
the integration of Jens Axboe's new block I/O code, while highly
disruptive, is a good thing.
Since 2.2, much of the block I/O subsystem has worked with a single
spinlock, called io_request_lock. If the system was trying to figure
out how to merge a request into a very long queue, or if a block
driver was slow in figuring out what it wanted to do, all other block
operations would have to stop and wait. This lock was serializing
operations which had nothing to do with each other, and was an obvious
scalability bottleneck.
With 2.5.1, that lock is no more; instead, each request queue (which,
in well-written drivers, corresponds to each device) has its own lock.
This kind of change can be scary, since some drivers will have
depended on the global serialization enforced by io_request_lock; its
removal has the potential to create subtle and nasty bugs. It may be a
little while before all the block drivers are known to be safe.
Another problem with the old block code was its use of the "buffer
head" ("bh") structure as the building block of the request queue.
Higher-level code would go to some lengths to create large, contiguous
block I/O requests, which would then be fragmented into a large number
of single-block requests, each with its own buffer head. The elevator
code then had the task of trying to merge the request back together
again.
Buffer heads are now a thing of the past, at least as a visible part
of the block I/O interface. Block I/O requests are now described by a
new bio structure which, in turn, contains a list of bio_vec
structures describing the data to be transferred. The bh structure
included a virtual pointer to the data to be transferred; the new
structures, instead, contain struct page pointers directly into the
system memory map.
Much of the kernel has moved toward working with page structures,
often as a result of the challenges of dealing with high memory, which
has no virtual mapping into kernel space. Block drivers will now have
to deal with high memory directly, but support code has been provided
to make that easier. The advantages of working with page structures
are worth the trouble; in particular, handling large, clustered
requests from the raw I/O layer (or the pending asynchronous I/O patch
by Ben LaHaise) will be much easier.
Also included are the block-highmem patches, which enable DMA
operations directly to and from high memory. With the 2.4 kernel, such
operations require copying data via "bounce buffers" in low memory.
Bounce buffers can create severe performance problems on large-memory
systems, and they are (usually) entirely unnecessary.
Finally, a whole set of support code has been added which hides much
of the structure of the request queue from block drivers. Included is
a nice routine for setting up DMA requests easily. The result is that
all block drivers must be updated, but the resulting code should be
simpler.
The block work is far from done, however; quite a bit of work is still
pending. Jens has already [20]stated his plan to break all of the
block drivers again shortly. Upcoming changes include moving the
building of SCSI-like commands into the generic block layer, and
running ioctl() operations through the request queue so that they are
automatically serialized with the I/O operations.
For more information, see [21]Jens's writeup of the block I/O changes
so far, and [22]Suparna Bhattacharya's notes on the LSE web site.
Merging the new kbuild. Back at the [23]Kernel Summit, it was agreed
that one of the first things to happen in 2.5 would be the integration
of the new kbuild code. Block I/O has jumped in first, but kbuild
remains on the agenda. To push things forward, Keith Owens has
[24]proposed a schedule for the merging of kbuild. It calls for the
new build code to be added in 2.5.2-pre1, and the old system to be
ripped out in -pre2. The original plan called for deferring the
integration of CML2 until 2.5.3, but Eric Raymond was less than
thrilled with the idea. So a [25]revised version of the timeline has
CML2 going in simultaneously with kbuild. There's just a couple of
obstacles to overcome, like the fact that the two do not currently
work together. One assumes these little details can be dealt with.
There has been little comment on the plan to integrate the new kbuild;
it does not appear to be a controversial change (though there is a
little grumbling about the new kbuild being slower).
Most speakers, when giving a talk, try to be well tuned to signals
from the audience. So, when your editor was addressing folks at Linux
Kongress about 2.5 changes, the sound of vomiting from the seats got
his attention. The subject at hand was, of course, CML2. This
development remains controversial, and the talk of integrating it with
kbuild started up the same old flame wars.
Said wars have been covered in this space in the past, and there is
very little to add. In theory, Linus has said he will merge CML2 and
the topic should be moot. Eric Raymond did not help things, however,
with [26]his statement that he plans to try to get Marcelo to
integrate CML2 into the 2.4 tree as well. This idea, at least, is not
controversial - almost nobody seems to think it's a good idea. The 2.4
kernel just does not need that sort of change.
With regard to 2.5, the main stumbling point still appears to be the
use of Python 2 as the implementation language. One would think people
could just install Python and be done with it, but it's apparently not
so simple. Most of the dissenters are just grumbling, but there are a
couple of other efforts out there. Greg Banks has a [27]CML2 in C
project going, though progress has pretty well stopped in recent
months. Jan Harkes, instead, has put together [28]a patch which ports
the CML2 code to Python 1.5. Since the older Python is available on
more older systems, one would hope this patch might help reduce the
complaining somewhat.
But, then, as devfs shows, some developments never seem to reach a
point of being accepted by everybody. (Current versions of these
patches are [29]kbuild 1.1.0 and [30]CML2 1.9.4).
Eliminating sleep_on. For years, the standard way to put a process to
sleep within the kernel is with the sleep_on() function or its
variants. sleep_on() simply blocks the calling process until somebody
explicitly wakes it (or, in some cases, a signal or timeout happens).
On SMP systems, however, sleep_on() has a serious problem. Consider a
typical usage:
if (something not ready)
sleep_on(&my_wait_queue);
If the "something" becomes ready between the two lines of code, the
wakeup event will be missed and the process may sleep for much longer
than intended.
Workarounds for this problem have existed for a long time. The
wait_event() macros handle this case without races; often semaphores
or the newish "completion event" interface can be used. If all else
fails, a relatively complicated "manual sleep" can be coded. All of
these techniques are used in the kernel, but code that calls
sleep_on() still exists.
The plan for some time has been to remove sleep_on() in the 2.5
series, on the theory that there is no safe way to call it. Now that
patches are going in, people have begun to ask when this removal might
take place. The answer, for now, is [31]a patch from David Woodhouse.
It does not yet go so far as completely removing the function; instead
it adds some checks which detect (and complain about) unsafe calls. It
is a gradual approach, but the intent remains the same: eventually
sleep_on() and friends will go away, and any code that still calls
them will have to be updated.
Incremental prepatches. H. Peter Anvin has [32]announced a
much-requested feature for the kernel.org archives: incremental
prepatches. Posted prepatches are relative to the last official kernel
release; users wishing to go from one prepatch to another have to
restart with a clean kernel, or explicitly back out the previous
prepatch. With the new scheme, it is necessary only to download the
(usually smaller) incremental patch and apply that. The incremental
patches will also make it easier to see exactly what has changed
between prepatches.
Integrating ALSA. The [33]Advanced Linux Sound Architecture project
has been working since [34]early 1998 to build a better sound
subsystem for the Linux kernel. Some people were surprised that ALSA
was not integrated into 2.4, but the fact is that the project never
proposed its code for that release. The ALSA hackers have been taking
their time and trying to get it right.
Now, however, it appears that the time has come. ALSA maintainer
Jaroslav Kysela has [35]indicated that he and the code are ready, and
Alan Cox has [36]encouraged him to submit it. The last call belongs to
Linus, of course, but chances are good that ALSA will find its way
into a 2.5 kernel before too long. It will probably live alongside the
OSS drivers for a while, but, in the long term, it seems certain that
OSS will eventually be removed.
Other patches and updates released this week include:
* Peter Braam has [37]released version 1.0.6-test1 of the InterMezzo
filesystem. There is also an InterMezzo roadmap available for
those interested in where this distributed filesystem is going.
* Larry McVoy has posted [38]a partial description of his
long-standing "ccCluster" idea. Worth a read for a different
approach to multiprocessor systems.
* Christoph Rohland has posted [39]a document for the tmpfs
filesystem, intended for the kernel documentation directory.
* IBM has released [40]version 1.0.10 of the JFS journaling
filesystem.
* Richard Gooch has released a pile of devfs updates, including
[41]devfsd-v1.3.20, [42]devfs-v99.21 (for 2.2 kernels),
[43]devfs-v199.3 (for 2.4) and [44]devfs-v203 (for 2.5).
* Davide Libenzi has posted [45]a patch which implements "task
struct coloring." This coloring is the spreading of task structure
alignment so that they do not all sit on the same cache line
(which is currently the case). The result should be improved
kernel performance, especially on SMP systems. A [46]later version
of the patch also adds kernel stack coloring.
* Bert Hubert has posted [47]a set of documents describing the
kernel's network traffic control capabilities. Traffic control has
been present since 2.2, and it provides some very nice features,
but lack of good documentation has limited its usage. This work is
a welcome step in the right direction.
* [48]Version v1.13 of the Dolphin PCI-SCI driver has been released
by Jeff Merkey.
* Keith Owens has released [49]kdb v1.9 for the 2.4.16 kernel.
* [50]ext3 0.9.16 for 2.4 kernels was released by Andrew Morton.
* The international kernel patch is back: a [51]beta version for
2.4.16 was announced by Herbert Valerio Riedel.
* Nathan Scott has [52]posted a new version of the extended
attributes interface.
* A patch improving the performance of kernel statistics counters
was [53]posted by Ravikiran G Thirumalai.
* Ian Stewart has [54]announced a new release of the AC'97
"linmodem" driver.
Section Editor: [55]Jonathan Corbet
December 6, 2001
For other kernel news, see:
* [56]Kernel traffic
* [57]Kernel Newsflash
* [58]Kernel Trap
Other resources:
* [59]Kernel Source Reference
* [60]L-K mailing list FAQ
* [61]Linux-MM
* [62]Linux Scalability Effort
* [63]Kernel Newbies
* [64]Linux Device Drivers
[65]Next: Distributions
[66]Eklektix, Inc. Linux powered! Copyright Л 2001 [67]Eklektix, Inc.,
all rights reserved
Linux (R) is a registered trademark of Linus Torvalds
References
1. http://lwn.net/
2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
3. http://lwn.net/2001/1206/
4. http://lwn.net/2001/1206/security.php3
5. http://lwn.net/2001/1206/dists.php3
6. http://lwn.net/2001/1206/devel.php3
7. http://lwn.net/2001/1206/commerce.php3
8. http://lwn.net/2001/1206/press.php3
9. http://lwn.net/2001/1206/announce.php3
10. http://lwn.net/2001/1206/history.php3
11. http://lwn.net/2001/1206/letters.php3
12. http://lwn.net/2001/1206/bigpage.php3
13. http://lwn.net/2001/1129/kernel.php3
14. http://lwn.net/2001/1206/a/2.5.1-pre5.php3
15. http://lwn.net/2001/1025/kernel.php3
16. http://marcelothewonderpenguin.com/
17. http://lwn.net/2001/1206/a/2.4.17-pre4.php3
18. http://lwn.net/2001/1206/a/no-design.php3
19. http://lwn.net/2001/1206/a/mutation.php3
20. http://lwn.net/2001/1206/a/ja-not-kidding.php3
21. http://lwn.net/2001/1206/a/bio-writeup.php3
22. http://lse.sourceforge.net/io/bionotes.txt
23. http://lwn.net/2001/features/KernelSummit/
24. http://lwn.net/2001/1206/a/kbuild-plan.php3
25. http://lwn.net/2001/1206/a/kbuild-plan2.php3
26. http://lwn.net/2001/1206/a/cml2-2.4.php3
27. http://lwn.net/2001/1206/a/cml2-in-c.php3
28. http://lwn.net/2001/1206/a/cml2-in-python1.php3
29. http://lwn.net/2001/1206/a/kbuild.php3
30. http://lwn.net/2001/1206/a/cml.php3
31. http://lwn.net/2001/1206/a/sleep_on.php3
32. http://lwn.net/2001/1206/a/incremental.php3
33. http://www.alsa-project.org/
34. http://lwn.net/1998/0226/a/elsa.html
35. http://lwn.net/2001/1206/a/alsa.php3
36. http://lwn.net/2001/1206/a/ac-alsa.php3
37. http://lwn.net/2001/1206/a/intermezzo.php3
38. http://lwn.net/2001/1206/a/ccCluster.php3
39. http://lwn.net/2001/1206/a/tmpfs.php3
40. http://lwn.net/2001/1206/a/jfs.php3
41. http://lwn.net/2001/1206/a/devfsd-v1.3.20.php3
42. http://lwn.net/2001/1206/a/devfs-v99.21.php3
43. http://lwn.net/2001/1206/a/devfs-v199.3.php3
44. http://lwn.net/2001/1206/a/devfs-v203.php3
45. http://lwn.net/2001/1206/a/task-coloring.php3
46. http://lwn.net/2001/1206/a/kernel-stack.php3
47. http://lwn.net/2001/1206/a/tc-doc.php3
48. http://lwn.net/2001/1206/a/pci-sci.php3
49. http://lwn.net/2001/1206/a/kdb.php3
50. http://lwn.net/2001/1206/a/ext3.php3
51. http://lwn.net/2001/1206/a/ikp.php3
52. http://lwn.net/2001/1206/a/ext-attrs.php3
53. http://lwn.net/2001/1206/a/counters.php3
54. http://lwn.net/2001/1206/a/ac97.php3
55. mailto:lwn@lwn.net
56. http://kt.zork.net/
57. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
58. http://www.kerneltrap.com/
59. http://lksr.org/
60. http://www.tux.org/lkml/
61. http://www.linux.eu.org/Linux-MM/
62. http://lse.sourceforge.net/
63. http://www.kernelnewbies.org/
64. http://www.xml.com/ldd/chapter/book/index.html
65. http://lwn.net/2001/1206/dists.php3
66. http://www.eklektix.com/
67. http://www.eklektix.com/
--- ifmail v.2.14.os7-aks1
* Origin: Unknown (2:4615/71.10@fidonet)
Вернуться к списку тем, сортированных по: возрастание даты уменьшение даты тема автор
Архивное /ru.linux/19861e8a8cbfb.html, оценка из 5, голосов 10
|