|
|
ru.linux- RU.LINUX --------------------------------------------------------------------- From : Sergey Lentsov 2:4615/71.10 13 Dec 2001 17:11:08 To : All Subject : URL: http://www.lwn.net/2001/1213/kernel.php3 --------------------------------------------------------------------------------
[1][LWN Logo]
[2]Click Here
[LWN.net]
Sections:
[3]Main page
[4]Security
Kernel
[5]Distributions
[6]Development
[7]Commerce
[8]Linux in the news
[9]Announcements
[10]Linux History
[11]Letters
[12]All in one big page
See also: [13]last week's Kernel page.
Kernel development
The current development kernel release is still 2.5.0. The current
2.5.1 prepatch is [14]2.5.1-pre10. On the surface, little has changed
over the last week; most of the changelog entries seem to be some
variant of "Jens Axboe: bio work." The thrashing of the block layer is
taking some time to stabilize - as to be expected from a change of
this magnitude. The last of the disruptive block I/O changes have not
yet hit the kernel, so this situation could persist for a while yet.
Also included in this prepatch is a Super-H architecture update, some
network driver work, an NTFS update, USB fixes, memory pools (see
below), and the inevitable superblock cleanup patches from Al Viro.
The current stable kernel release is 2.4.16. Marcelo's prepatches are
up to [15]2.4.17-pre8; he has stated that the next prepatch will be
the first 2.4.17 release candidate. Marcelo's stated plan is to have
the final release be the same as the last release candidate; the hope
is to be done with surprises caused by last-minute patches.
Memory pools are a new addition to the kernel as of 2.5.1-pre10. The
idea behind "mempools," which were implemented by Ingo Molnar, is to
provide a memory allocation function that is guaranteed to work, even
when memory is tight. Some places in the kernel can not afford to have
memory allocations fail. For example, memory pressure can force the
system to swap pages out, but that swap operation will require memory
to be executed. If the memory to set up the swap is not available, the
system comes to a halt.
Memory pools work by simply preallocating a bunch of memory and
keeping it aside until it's needed. The actual allocation and freeing
of memory is handled by somebody else (the idea seems to be for
mempools to be layered over the slab allocator); all mempools do is
stock up ahead of time. Their use will thus increase the kernel's
memory consumption (by the amount of memory that is set aside). For
certain critical paths, though, they should help to improve the
stability of the system under heavy load.
Coming soon: bigger device numbers. One of the long-stated plans for
2.5 is to increase the size of dev_t, the type which is used to
represent device numbers. This type, as it stands now, has roots all
the way back to the original Unix systems - it is a 16-bit quantity,
with eight bits for the major number, and eight for the minor. It is
inadequate for modern systems, which can have, literally, thousands of
devices on them. So dev_t has to grow.
Linus laid out the plan some time ago (see [16]the March 29, 2001 LWN
Kernel Page): dev_t would grow to 32 bits. Of those, twelve would
designate the major number, and 20 the minor number. A number of
people would rather see 64-bit device numbers, but Linus is opposed to
that.
Changing device numbers raises a number of interesting compatibility
problems. Consider, for example, a tar or dump archive containing a
/dev directory. The archive contains the device numbers for every
entry in that directory; if those numbers stop working after the dev_t
change, everybody's backups have just been rendered invalid. System
administrators, when faced with that prospect, tend to break out in a
cold sweat, overindulge in beer, and switch to BSD.
Fortunately, that particular problem [17]has a solution. In the new
scheme, the major number zero is set aside as a marker for "legacy"
device numbers. Any 32-bit device number with a major number of zero
is interpreted as an old-style number and "just works." A change to
the C library will be required before applications can exchange larger
device numbers with the kernel, but the change should be relatively
smooth beyond that.
On the kernel side, however, life could be more interesting. Kernel
developers really do try to avoid breaking applications, but they are
more willing to tear things up inside the kernel. Especially in a
development series.
The kernel version of the device number type is kdev_t. It has long
been meant to be an opaque type, but it's really just dev_t in kernel
drag. People had assumed that kdev_t would grow along with dev_t, but
[18]that's not what Linus has in mind. Linus wants kdev_t to go away
entirely. All of the interfaces in the kernel which currently use that
type will be changed to take a pointer to an appropriate structure.
Block drivers, thus, will see a pointer to a struct block_device
rather than a device number. Some sort of struct char_device will also
probably be created to handle a similar role.
In other words, the kernel will no longer use device numbers at all,
except as a means of communication with user space. Internally, device
numbers will not exist. A lot of kernel code is going to have to
change to make this happen; one does not have to look very hard to see
more unstable development kernel releases in the future - see, for
example, [19]Al Viro's description of some of the issues involved.
But, then, that's what development kernels are for.
Where do important changes get tested? One would think that, now that
we finally have a development kernel again, non-trivial changes would
show up there before being merged into the stable 2.4 series. Thus,
there was [20]some surprise when support for "hyperthreading" on
Pentium IV processors went into 2.4.17-pre5. That support still does
not exist in 2.5, and has thus not seen the wider testing that it
could experience there.
The reasoning behind putting this change into 2.4, as [21]explained by
Alan Cox, is interesting. The claim that normal users will not be
affected by the change is standard. But Alan also points out that, due
to the ongoing block I/O work, the 2.5 series "isn't usable for that
kind of thing in the near future." So, if a feature like
hyperthreading is to be tried out, it must be added to the stable
kernel series.
Things will get better as the block layer stabilizes - at least, until
the next set of disruptive changes go in. Until then, it's a bit
ironic that the only place to test certain kinds of changes is the
stable kernel series.
(Hyperthreading, for those who are interested, is the hardware trick
of making a single processor appear to be multiple virtual processors
as a way of keeping busy while waiting for memory accesses. See
[22]Intel's Hyperthreading page for details).
Work on the scheduler is also coming to a boil. It is a widely (though
not universally) held belief that the Linux scheduler is overdue for a
rewrite in 2.5. [23]Quoting Alan Cox again:
Its a great scheduler for a single or dual processor 486/pentium
type box running a home environment. It gets a bit flaky by the
time its running oracle on a 4 way, it gets very flaky by the time
its running lotus back ends on an 8 way. It doesn't take lunacy
like java, broken JVM implementations and volcanomark to make it go
astray.
The scheduler's performance on larger systems and under load has been
shown to be inadequate numerous times. But there is little agreement
on what should replace it.
Mike Kravetz and company at IBM have posted [24]a new multi-queue
scheduler patch for the 2.5.0 kernel. This scheduler cuts down on
scheduling time by maintaining a separate run queue for each processor
on the system. It tries to improve performance while maintaining the
same behavior as the existing scheduler.
Alan Cox has [25]a new scheduler of his own which works by maintaining
a set of eight (currently) run queues for each processor. Picking a
process to run is just a matter of taking the first one off the
highest priority queue.
Finally, Davide Libenzi has [26]a scheduler patch which implements a
per-CPU run queue and some load balancing code.
All of these projects share the same goals: cut down on scheduling
overhead, work harder to keep processes from moving between
processors, and retain good performance in low-load situations. The
low-load performance is considered critical: it is, after all, the
normal situation for most systems, and the current scheduler handles
it well. No patch which impairs low-load performance is likely to get
too far.
The hyperthreading issue mentioned above is likely to throw a new set
of complications into the mix. A processor which does hyperthreading
looks like two independent CPUs, but it should not be scheduled as
such - it is better to divide process across real (hardware)
processors first. Expect scheduling to be a hot topic for some time.
Linux Advanced Routing & Traffic Control Documentation Project. Bert
Hubert has been working for some time on the documentation of the
advanced Linux routing features. The Linux traffic control mechanism
has been available since the 2.1 days, but is greatly underutilized.
The quality of the available documentation has not helped here. The
code is great, but it's hard to figure out how to use it. So an effort
to shine some light in that direction is more than welcome.
Bert's work has how grown into the [27]Linux Advanced Routing &
Traffic Control documentation project, and a great deal of information
is available there. The latest addition is the [28]tc-cbq man page:
"Nearly 2500 words, 8 printed pages, of nearly unintelligible
gobledygook, explaining mostly how CBQ works." Good stuff.
Other patches and updates released this week include:
* Andrea Arcangeli has [29]made available (as a tarball containing a
magicpoint file) the slides from his PLUTO talk on the new VM
implementation. This is the first documentation that has been made
available on the new code. We have also made the slides available
[30]in HTML format.
* Daniel Phillips has [31]posted his ALS paper on ext2 directory
indexes, along with a wealth of benchmark results. Worth a look if
this work interest you at all.
* [32]Kernel Traffic #145 (December 10) is available.
* Rusty Russell has [33]posted a patch making it easy for kernel
code to set up per-CPU data areas.
* The [34]ltp-20021206 release is available from the Linux Test
Project.
* The latest User-mode Linux release from Jeff Dike is
[35]0.53-2.4.16.
* A new [36]preemptible kernel patch is available from Robert Love.
* Karim Yaghmour has [37]released version 0.9.5pre4 of the Linux
Trace Toolkit.
* Jason Baietto has [38]released a set of "multiprocessor control
interface" programs. These allow users to bind tasks to processors
and other, similar tasks.
* Ben LaHaise has posted [39]a patch which adds his kvec type
(essentially a lightweight replacement for kiobufs) to the kernel.
kvecs are needed for his asynchronous I/O work, among other
things. Also available is [40]this patch, which works the kvec
structure into the new block I/O code.
* Eric Raymond has released [41]CML2 1.9.7.
* The [42]2001_12_10 release of the security module code is
available. Also available is [43]a new security module adding
labeled IPv4 networking to SELinux.
* Lennert Buytenhek has [44]released version 0.0.4pre1 of his
bridging netfilter code.
* Jozsef Kadlecsik has been [45]added to the netfilter core team.
Section Editor: [46]Jonathan Corbet
December 13, 2001
For other kernel news, see:
* [47]Kernel traffic
* [48]Kernel Newsflash
* [49]Kernel Trap
Other resources:
* [50]Kernel Source Reference
* [51]L-K mailing list FAQ
* [52]Linux-MM
* [53]Linux Scalability Effort
* [54]Kernel Newbies
* [55]Linux Device Drivers
[56]Next: Distributions
[57]Eklektix, Inc. Linux powered! Copyright Л 2001 [58]Eklektix, Inc.,
all rights reserved
Linux (R) is a registered trademark of Linus Torvalds
References
1. http://lwn.net/
2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
3. http://lwn.net/2001/1213/
4. http://lwn.net/2001/1213/security.php3
5. http://lwn.net/2001/1213/dists.php3
6. http://lwn.net/2001/1213/devel.php3
7. http://lwn.net/2001/1213/commerce.php3
8. http://lwn.net/2001/1213/press.php3
9. http://lwn.net/2001/1213/announce.php3
10. http://lwn.net/2001/1213/history.php3
11. http://lwn.net/2001/1213/letters.php3
12. http://lwn.net/2001/1213/bigpage.php3
13. http://lwn.net/2001/1206/kernel.php3
14. http://lwn.net/2001/1213/a/2.5.1-pre10.php3
15. http://lwn.net/2001/1213/a/2.4.17-pre8.php3
16. http://lwn.net/2001/0329/kernel.php3
17. http://lwn.net/2001/1213/a/hpa-dev_t.php3
18. http://lwn.net/2001/1213/a/lt-kdev_t.php3
19. http://lwn.net/2001/1213/a/av-kdev_t.php3
20. http://lwn.net/2001/1213/a/ht.php3
21. http://lwn.net/2001/1213/a/ac-2.5.php3
22. http://developer.intel.com/technology/hyperthread/
23. http://lwn.net/2001/1213/a/ac-scheduler.php3
24. http://lwn.net/2001/1213/a/mq.php3
25. http://lwn.net/2001/1213/a/8queue.php3
26. http://lwn.net/2001/1213/a/dl-scheduler.php3
27. http://ds9a.nl/lartc/
28. http://lwn.net/2001/1213/a/cbq.php3
29. http://lwn.net/2001/1213/a/aa-vm.php3
30. http://lwn.net/2001/1213/aa-vm-talk/
31. http://lwn.net/2001/1213/a/directory-index.php3
32. http://kt.zork.net/kernel-traffic/kt20011210_145.html
33. http://lwn.net/2001/1213/a/per-cpu.php3
34. http://lwn.net/2001/1213/a/ltp.php3
35. http://lwn.net/2001/1213/a/uml.php3
36. http://lwn.net/2001/1213/a/pk.php3
37. http://lwn.net/2001/1213/a/ltt.php3
38. http://lwn.net/2001/1213/a/mci.php3
39. http://lwn.net/2001/1213/a/kvec.php3
40. http://lwn.net/2001/1213/a/kvec-bio.php3
41. http://lwn.net/2001/1213/a/cml.php3
42. http://lwn.net/2001/1213/a/sm.php3
43. http://lwn.net/2001/1213/a/selopt.php3
44. http://lwn.net/2001/1213/a/bridge-netfilter.php3
45. http://lwn.net/2001/1213/a/nct.php3
46. mailto:lwn@lwn.net
47. http://kt.zork.net/
48. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
49. http://www.kerneltrap.com/
50. http://lksr.org/
51. http://www.tux.org/lkml/
52. http://www.linux.eu.org/Linux-MM/
53. http://lse.sourceforge.net/
54. http://www.kernelnewbies.org/
55. http://www.xml.com/ldd/chapter/book/index.html
56. http://lwn.net/2001/1213/dists.php3
57. http://www.eklektix.com/
58. http://www.eklektix.com/
--- ifmail v.2.14.os7-aks1
* Origin: Unknown (2:4615/71.10@fidonet)
Вернуться к списку тем, сортированных по: возрастание даты уменьшение даты тема автор
Архивное /ru.linux/198610186ec74.html, оценка из 5, голосов 10
|