|
|
ru.linux- RU.LINUX --------------------------------------------------------------------- From : Sergey Lentsov 2:4615/71.10 19 Jul 2001 16:52:04 To : All Subject : URL: http://www.lwn.net/2001/0719/kernel.php3 --------------------------------------------------------------------------------
[1][LWN Logo]
[2]Click Here
[LWN.net]
Sections:
[3]Main page
[4]Security
Kernel
[5]Distributions
[6]On the Desktop
[7]Development
[8]Commerce
[9]Linux in the news
[10]Announcements
[11]Linux History
[12]Letters
[13]All in one big page
See also: [14]last week's Kernel page.
Kernel development
The current kernel release is still 2.4.6. Linus's 2.4.7 prepatch is
up to [15]2.4.7pre7, with no word on when a real 2.4.7 release will
happen (to say nothing of the much-awaited 2.5.0). Alan Cox,
meanwhile, is at [16]2.4.6ac5.
Keeping your processes from wandering. In an ideal world, all
processors on an SMP system would be identical, and it would not
matter where any particular process runs. Life is different, of
course, in the real world. Here, not all processors are the same from
a process's point of view.
The bottleneck between the processor and memory forces the use of
multiple layers of cache memory within each processor itself. By
keeping frequently-accessed memory close to the processor, the cache
has a major accelerating effect on performance. Often the best
performance optimizations don't involve squeezing out instructions or
unrolling loops; instead, the best results often come from changing
data access patterns to work better with the processor cache.
Foremost among those optimizations, certainly, is to avoid trashing
the cache completely. But that is what happens when a process moves
from one CPU to another. The cache which has been built up in the old
CPU does not follow the process to its new home. As a result, the
process runs slowly for some time as it fills the cache on the new
processor, perhaps forcing out another process's data while it's at
it.
For this reason, the Linux scheduler tries hard to avoid moving
processes between CPUs. Normally it works reasonably well; if two jobs
are running on a two-processor system, one would expect each job to
stick to one processor. So a group of kernel hackers were surprised
when then [17]found a case where processes would continually cycle
through all of the processors on a system. Another user [18]reported
similar behavior; he found that running a single, compute-intensive
process on a two-processor system would actually go faster if he fired
up "setiathome" to keep one of the processors occupied.
What appears to be happening is this: one CPU is happily running a
process (we'll call it "p1") when it does something that makes another
process ("p2") runnable. The scheduler decides that p2 should execute
on a different CPU, so it sends an "inter-processor interrupt" to
force the other CPU to go into the scheduler and pick up the new task.
All appears to have been properly arranged, and the scheduler on the
original CPU returns to the original process (p1) that was running
there.
That process, however, quickly hits a stopping point, forcing a new
scheduling decision. Because inter-processor interrupts take a while,
p2 still has not started running on its intended CPU. Instead, the
first CPU sees p2 ready to go, and starts running it. When p1 again
becomes runnable, it will find that p2 has taken its place; it's p1
that gets booted out of its processor and has to move to a new home
with a cold, unwelcoming cache.
With the right kind of load, that sequence of events can happen over
and over, causing processes to move frequently through the system. The
result is poor performance, bad benchmark results, and an increase in
"Linux sucks" posts on the net.
[19]The fix, as posted by Hubertus Franke, is to mark a process when
it is decided that said process will run on a different CPU. Other
processors will not attempt to run a process marked in this way, while
the target processor will make a point of running it. The fix removes
the race condition between the two processors, and restores a bit of
stability in this particular case. Of course, being a scheduler
change, it may well make things worse for some other type of load, but
nobody has identified that load yet...
Journaling filesystems are slower? While nobody disputes the benefits
provided by journaling filesystems, the generally-accepted wisdom
seems to be that journaling necessarily slows things down. After all,
a journaling filesystem adds the overhead of maintaining the journal
and very carefully serializing operations to preserve the integrity of
the filesystem at all times. That extra work costs.
It turns out, however, that there is an important class of
applications for which a journaling filesystem can be faster. Certain
applications need to know when data written to the disk is actually
committed to the platter; usually they are working with explicit data
ordering constraints of their own. Such applications will use one of
the synchronous write operations in the filesystem to enforce these
constraints. Database systems can operate in this mode. The NFS
protocol also requires that a (strictly conforming) NFS server also
perform synchronous writes.
A synchronous write operation can cause several disk head seeks, as
the data and associated metadata are updated. And that, of course, can
take a while. When journaling is in use, however, the story is
different. Once all of the relevant data is in the journal, the
filesystem can report a synchronous write as being complete; the full
writeback can then happen at leisure, since the data is safe in the
journal.
And the journal, of course, is laid out on a contiguous piece of the
disk. Journaling, thus, removes the head seeks from synchronous writes
and eliminates much of the latency from those operations. With some
preliminary tests using ext3 and knfsd, performance was [20]reported
to be 1.5 times better. Journaling is not only safer; it may even be
faster.
Cleaning out the right zones. Marcelo Tosatti has been working on
[21]a patch which provides detailed information on how the memory
management system is working in the 2.4 kernel. After all, the various
efforts to improve memory management can only be helped by having a
view of what is actually going on. One of the first [22]results that
Marcelo has found is that the code that tries to free up pages in
response to memory shortages is often not looking in the right place.
Linux divides physical memory into multiple "zones," each of which has
different physical characteristics; for example, the DMA zone contains
memory that may be used for DMA operations to ISA devices. (See
[23]the June 7 kernel page for a more detailed discussion of zones.)
Memory allocation can be requested from one or more zones in
particular. Often, only a specific zone will do for a particular
request.
The problem is that, while the kernel allocates memory from specific
zones, it does not take zones into account when freeing memory.
Instead, it blindly passes through memory freeing anything that looks
useful. As a result, the kernel could be freeing memory (i.e. taking
it from processes that could use it) that belongs to a zone that
already has plenty of free memory and does not need any more.
Meanwhile, another zone could be under tremendous pressure which is
not helped in any way by freeing memory from the first zone.
This sort of behavior has been suspected in the past, but Marcelo's
instrumentation has shown that it really happens. So what is to be
done but make [24]a new patch which causes the kernel to go after
pages belonging to the specific zones that are feeling pressure?
Evidently some sorts of deadlock problems have already been solved by
this patch. It will see some reworking (Linus had some quibbles with
the implementation), but this one looks destined for a 2.4 kernel
sometime soon. (See also: Dave McCracken's [25]patch for a silly
swapping bug that would prevent the use of high memory for swap reads;
this one, too, could be responsible for a lot of problems.)
Other patches and updates released this week include:
* The Stanford Checker is back. The latest results include [26]code
which uses memory that has been freed (10 instances), and
[27]unsafe use of user-supplied values (52 instances), which can
lead to nasty security bugs.
* IBM has released [28]version 2.2.0 of the Dynamic Probes kernel
debugging tool.
* Keith Owens has [29]released a new version of the 2.5 kernel build
system which has the "implicit dependency" problem solved.
* Justin Gibbs has [30]announced a beta release of version 6.2.0 of
the aic7xxx SCSI driver. Among other things, it includes high
addressing support.
* The example driver code from the second edition of Linux Device
Drivers is now available for download from [31]the O'Reilly web
site. The full release of the book source will take a little
longer, however.
Section Editor: [32]Jonathan Corbet
July 19, 2001
For other kernel news, see:
* [33]Kernel traffic
* [34]Kernel Newsflash
* [35]Kernel Trap
Other resources:
* [36]Kernel Source Reference
* [37]L-K mailing list FAQ
* [38]Linux-MM
* [39]Linux Scalability Project
* [40]Kernel Newbies
[41]Next: Distributions
[42]Eklektix, Inc. Linux powered! Copyright Л 2001 [43]Eklektix, Inc.,
all rights reserved
Linux (R) is a registered trademark of Linus Torvalds
References
1. http://lwn.net/
2. http://ads.tucows.com/click.ng/pageid=001-012-132-000-000-003-000-000-012
3. http://lwn.net/2001/0719/
4. http://lwn.net/2001/0719/security.php3
5. http://lwn.net/2001/0719/dists.php3
6. http://lwn.net/2001/0719/desktop.php3
7. http://lwn.net/2001/0719/devel.php3
8. http://lwn.net/2001/0719/commerce.php3
9. http://lwn.net/2001/0719/press.php3
10. http://lwn.net/2001/0719/announce.php3
11. http://lwn.net/2001/0719/history.php3
12. http://lwn.net/2001/0719/letters.php3
13. http://lwn.net/2001/0719/bigpage.php3
14. http://lwn.net/2001/0712/kernel.php3
15. http://lwn.net/2001/0719/a/2.4.7pre7.php3
16. http://lwn.net/2001/0719/a/2.4.6ac5.php3
17. http://lwn.net/2001/0719/a/cpu-hopping.php3
18. http://lwn.net/2001/0719/a/gzip-case.php3
19. http://lwn.net/2001/0719/a/sched-fix.php3
20. http://lwn.net/2001/0719/a/am-journaling.php3
21. http://lwn.net/2001/0719/a/vm-stats.php3
22. http://lwn.net/2001/0719/a/zoneinfo.php3
23. http://lwn.net/2001/0607/kernel.php3
24. http://lwn.net/2001/0719/a/zone-fix.php3
25. http://lwn.net/2001/0719/a/swapbug.php3
26. http://lwn.net/2001/0719/a/c-memory.php3
27. http://lwn.net/2001/0719/a/c-security.php3
28. http://lwn.net/2001/0719/a/dprobes.php3
29. http://lwn.net/2001/0719/a/kbuild.php3
30. http://lwn.net/2001/0719/a/aic7xxx.php3
31. http://www.oreilly.com/catalog/linuxdrive2/
32. mailto:lwn@lwn.net
33. http://kt.zork.net/
34. http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html
35. http://www.kerneltrap.com/
36. http://lksr.org/
37. http://www.tux.org/lkml/
38. http://www.linux.eu.org/Linux-MM/
39. http://www.citi.umich.edu/projects/linux-scalability/
40. http://www.kernelnewbies.org/
41. http://lwn.net/2001/0719/dists.php3
42. http://www.eklektix.com/
43. http://www.eklektix.com/
--- ifmail v.2.14.os7-aks1
* Origin: Unknown (2:4615/71.10@fidonet)
Вернуться к списку тем, сортированных по: возрастание даты уменьшение даты тема автор
Архивное /ru.linux/19861d28197de.html, оценка из 5, голосов 10
|