On Mindcrafts April 2022 Benchmark

From Camera Database
Jump to: navigation, search

I just compiled [Apache] without any modules disabled .... I'm using the highperformance-conf.dist config file from the distribution." Karthik's post regarding linux kernel and its followups is also available. This sounds rather like the behavior Mindcraft reported ("After the restart, Apache performance climbed back to within 30% of its peak from a low of about 6% of the peak performance"). Kernel issue 2: Wake-One and the Thundering Herd



(Note: According to the Linux Scalability Project's paper on the thundering herd problem, a "task exclusive" wake-one patch is now integrated into the 2.3 kernel; however, according to Andrea, as of 2.4.0-test10, it still wakes up processes in same order they were put to sleep, which is not optimal from a caching point of view. It would be more efficient to have the reverse order. See also Nov 2000 measurements by Andrew Morton ([email protected]); post 1, post 2, and Linus' reply.) Phillip Ezolt, 5/05/99 in linux-kernel. "Overscheduling DOES occur with high web server loads." "): "When I ran a SPECWeb96 strobe on Alpha/linux, I found 18% of the time was spent in the scheduler when the CPU is pegged." This is what Russowitzich mentioned in his critique of Linux. This post sparked a lively thread in linux kernel (now in its second-week). It looks like Apache (and the scheduler) are ready for some changes. - Rik van Riel, 6 May 1999, in linuxperf (Re: [linuxperf] Possible fix for Mindcraft Apache problem): ... The main problem with web benchmark remains. The way Apache (and Linux) 'cooperate' is problematic. That is, when a signal comes in, all processes are woken up and the scheduler has to select one from the dozens of new runnable processes.... The real solution is switching from wake-all semantics into a wake-1 style. This will avoid the runqueues Phillip Ezolt at DEC experienced. The good news is that it's a simple patch that can probably be fixed within a few days... - Tony Gale, 6 May 1999, in linuxperf ( Re: [linuxperf] Possible fix for Mindcraft Apache problem): Apache uses file locking to serialise access to the accept call. This can be very expensive for some systems. I haven’t had time to run the Linux numbers yet for the 10 or more server models that are available to find the most efficient. Check Stephens UNPv1 2nd Edition Chapter 27 for details. - Andrea Arcangeli, May 12th, 1999, in linux-kernel ( [patch] wake_one for accept(2) [was Re: Overscheduling DOES happenwith high web server load.] 2.2.8_andrea1.bz2) - I released a new andrea patch against 2.2.8. This new one has my new wakeup on accept(2) strightforward codes (but to get this improvement you must make certain that your apache task are sleeping in accept(2). A string -p "pidof apache" should tell you that. This patch can be linked from here. David Miller's response to the above:...on every new TCP connection there will be two spurious and unneeded wakeups. These originate in the write_space socket socket callback because we free up the SYN frame, which wakesup listening socket sleepers. This exact problem is what I've been working on today. Ingo Molnar (May 13th 1999 in Linux-kernel (Re: [RFT] 2.0.8_andrea1 wake-1 [Re: Overscheduling DOES take place with high web server load. ]): Note that pre-2.3.1 already includes a wake-one implementation of accept()... and there are more. - Phillip Ezolt ([email protected]), May 14th, 1999, in linux-kernel ( Great News!! Was: [RFT] 2.2.8_andrea1 wake-one ): I've been doing some more SPECWeb96 tests, and with Andrea's patch to 2.2.8 (ftp://ftp.suse.com/pub/people/andrea/kernel/2.2.8_andrea1.bz) **On identical hardware, I get web-performance nearly identical to Tru64! **... Tru64 4ms2.2.5 100ms2.2.8 9ms2.2.8_a4ms... The time spent in schedule has decreased as shown by this Iprobe data. The number of SPECWeb96 maxOps per second has increased as well. **Please add the wakeone patch to the 2.2.X kernel. Larry Sendlosky tested this patch and said: The 2.2.8 patch is effective in improving apache performance on a single CPU system, but it doesn't improve performance on a SMP system with two CPUs.



below. Also see: - Dimitris Michailidis, 14 May 1999 in linux–kernel. ([PATCH] scheduler fixes, improvements and improvements). -- several improvements to the 2.2.8 scheduler. - Andrea Arcangeli, [email protected], 21 May 1999 in linux–kernel. (Re: andrea buffer code (2.2.9–C.gz.) ) -- update. There might be some SMP bottleneck fixes. Kernel Issue #3: SMP Bottlenecks within 2.2 Kernel



- Juergen Schmidt, May 19th, 1999, in linux-kernel ( Bad apache perfomance wtih linux SMP), asked what could make Apache do poorly under SMP. Andi Kleen replied: One culprit is most likely that the data copy for TCP sending runs completely serialized. This can be fixed by replacing the skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err); in tcp.c:tcp_do_sendmsg with unlock_kernel(); skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err); lock_kernel(); The patch does not violate any locking requirements in the kernel... [To fix your connection refused errors,] try: echo 32768 > /proc/sys/fs/file-max echo 65536 > /proc/sys/fs/inode-max Overall it should be clear that the current Linux kernel doesn't scale to CPUs for system load (user load is fine). Although it is false, I blame Linux vendors for promoting it. ... The work to fix all these issues is underway. [2.3 will first be fixed, then the modifications will be backported into 2.2]. [Note: Andi's TCP unlocking fix appears to be in 2.2.9-ac3.] Andrea Arcangeli responded describing his own version of this fix ( ftp://ftp.suse.com/pub/people/andrea/kernel/2.3.3_andrea2.bz2 ) as less cluttered: If you look at my patch (the second one, in the first one I missed the reaquire_kernel_lock done before returning from schedule, woops :) then you'll see my approch to address the unlock-during-uaccess. My patch doesn't change tcp/ip extension2, etc... but it only affects uaccess.h. and usercopy.c. I don't like to have unlock_kernel everywhere. Juergen Schirm, 26 May 1999 on linux_kernel und new-httpd - Linux/Apache/SMP - My fault), has retract his earlier problem report. I reported "disastrous” performance for Linux/Apache on a SMP platform. To doublecheck, I've downloaded a clean kernel source (2.2.8 and 2.2.9) and had to realize, that those do *not* show the reported penalty when running on SMP systems. After seeing the first extremely bad results, I mistakenly used the kernel sources installed (which were patched from 2.2.5-2.2.8). These sources were already modified before the machine arrived to me. You should have thrown them out in the first place. Please excuse my confusion. Others have reported modest performance gains (20% or so) with Andrea's SMP fix, but only when serving largish files (100 kilobytes). Juergen has now completed his testing. Unfortunately, he neglected to compile Apache with -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT, which ( according to Andrea) significantly hurt Apache performance. If Juergen missed that, it means it's too hard to figure out. To make it easier to get good performance in the future, we need the wake-one patch added to a stable kernel (say, 2.2.10), and we need Apache's configuration script to notice that the system is being compiled for 2.2.10 or later, and automatically select SINGLE_LISTEN_UNSERIALIZED_ACCEPT. Other Apache users are available to help with performance issues



Mike Whitaker ([email protected]), 22/05/99 in linuxperf - High load under Apache1.3.3/mod_perl1.16/Linux2.2.7SMP ), described a performance problem. A typical webserver has a dual PII450 and split httpds. Typically 300 static pages serve the pages, and proxy to 80-100 mod_perl adverts. Unneeded modules will be disabled and hostname lookups shut down, just as any sensible person would. There are usually between one to three mod_perl searches/page, plus the usual dozen or so images inline. The kernel (2.2.7) has MAX_TASKS upped to 4090, and the unlock_kernel/lock_kernel around csum_and_copy_from_user() in tcp_do_sendmsg that Andi Kleen suggested. It is quite interesting. Load fluctuates between 10-12, while the user CPU goes 0 (80% idle) up to 180% (0% idle machine *crawling*), around once per minute and a third. vmstat displays the number and type of processes in a run state. It ranges from 0 when load is low to 30-40 to 60-40, while the static servers manage 60-70 hits per second. Without the dynamic httpd, everything *flies He was advised to try a kernel that has wake-one support. Identical systems: dual PII450, 1G and two disk controllers. The wake-one patch is clearly doing its job. The 2.2.7 machine still has loads into three figures, while the 2.3.3 machine hasn’t managed to actully manage a load of 1. Unfortunately, observation suggests that about one connection is being lost/ignored by the 2.3.3 machine/Apache combination in ten. (Network error. Connection reset by peer. His next update was on May 25th.

9_andrea3 (wake-one) patch seems to do the trick: can handle hits at a speed which suggests it's pushing the adverser to close to its observed maximum. (I have already warned you to avoid 2.2.8. It can cause HDs to be destroyed. For more information, see the threads on linux kernel. However... But... When the idle CPU drops below zero (i.e. its spending most of its time processing advert requests, everything goes unpleasantly pearshaped, with a load of 400+, and the number of httpd's on both types of server *well* above MaxClients (in fact, suspiciously close to MaxClients + MinSpareServers). This can be caused when there is a spike in demand. Once this happens, it becomes difficult to get out of this state. This is counterintuitive. You can *REDUCE* MaxClients and hope that the tcp listen queue can handle a load surge. This works, as evidenced by experience. (Aside, this is a perfect example for Eddieware's load balance DNS. - Eric Hicks, 26 May 1999, in linux-kernel ( Apache/kernel problem? ): ... I am experiencing major problems. It seems that a single PII 400Mhz and a single AMD400 will outrun a two-PII 450 at http request from Apache. ... HTTP Server Tests Data: 100 1MByte MPEG files stored on local drives. Results: - AMD 400Mghz K6, 128MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec. - PII 400Mghz, 512MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec. Dual PII/450Mghz 512MB and Linux 2.2.8; handles far less clients than 300 @57.6Kbits/sec.

I advised him that he try 2.2.9_andrea3 and he said he would. Kernel issue #4: Interrupt Bottlenecks



According to Zach, the Mindcraft benchmark's use of four Fast Ethernet cards and a quad SMP system exposes a bottleneck in Linux's interrupt processing; the kernel spent a lot of time in synchronize_bh(). (A single Gigabit Ethernet cable would lessen this bottleneck. According to Mingo, TCP throughput scales much better with number of CPUs in 2.3.9 than it did in 2.2.10, although he hasn't tried it with multiple Ethernets yet. Steven Guo and Steve Underwood also commented on the issue of interrupts under heavy loads. See also Linus's "State of Linux" talk at Usenix '99 where he talks about the Mindcraft benchmark and SMP scalability. SCT's Jan 2000 comments regarding progress in scalability. Softnet is coming! Kernel 2.3.43 introduces the softnet networking changes. Softnet modifies the interface to the networking card drivers. This means that every driver needs to be updated. However, large SMP systems should see a much higher network performance. (For more info, see Alexy's readme.softnet, his softnet-howto, or his Feb 15 post about how to convert old drivers.) The Feb '00 thread Gigabit Ethernet Bottlenecks (especially its second week) has lots of interesting tidbits about how what interrupt (and other) bottlenecks remain, and how they are being addressed in the 2.3 kernel. Ingo Molnar wrote a post 27 February 2000 that explains the IA32 code's improvements to interrupt handling in great detail. These improvements will be integrated into the core kernel in 2.5 it seems. Kernel issue #5 is a mysterious network slowdown



This is a bug and not a scaling issue. Several 2.2 users have reported that sometimes networking slows down to 1 to 10% of normal, with high ping times, and that cycling the interface fixes the problem temporarily. Minecraft Servers Oystein Sigsen reported that after upgrading to 2.2, we experienced occasional slowdowns in TCP performance. The performance returns to normal after I take down the interface. Once that is done, the performance is good for a few days, or even weeks. David Stahl reported on 29 Jun 1999: I have 3 computers running 2.2.10 [with multiple]3COM 905b PCI [cards ]...] After approximately two days of uptime I will begin to notice ping times jump to 7-20 secs on the local network. As others have noticed, there is no loss -- just some damn high latency. ... It seems to be dependant upon the network load -- lighter loads lead to longer periods between problems. It is also gradual. It will start at 4 second, then 7 second, then 30 minutes later it can go up to 12-20 seconds. - Another eepro100 reports. A tulip report. Less likely to happen again. - David Stahl wrote on 13 July 1999: What DID fix the problem was a private reply from someone elese (sorry about the credit, but i'm not in the mood to sieve 10k emails right now), to try the alpha version of the latest 3c59x.c driver from Donald Becker (http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html). 3c59x.c:v0.99L 5/28/99 is the version that fixed it, from ftp://cesdis.gsfc.nasa.gov/pub/linux/drivers/test/3c59x.c - On 23 Sep 1999, Alexey posted a one-line patch that clears up a similar mysterious slowdown. This patch has been applied to Red Hat 6.1 and 2.2.13. This patch was applied on three Red Hat 6.x systems I know that have Masq support installed and connected to cable modems. The patch corrected a bug that caused very high pings even after short bursts with heavy TCP transfer to distant hosts. Rickard Cedergren, Michael Brown and others reported on linux-kernel on October 21st that Alexey's patch had greatly improved the problem but it is still not completely gone. Tony Hoyle is also seeing occasional long delays with 2.2.13. Jeremy Fitzhardinge reported another big delay; the replies say it's likely caused by a particular Tulip driver. Kernel issue #6. 2.2.x/NT-TCP slowdown



Petru Paler, July 10 1999, in linux-kernel ( [BUG] TCP connections between Linux and NT ) reported that any kind of TCP connection between Linux (2.2.10) and a NT Server 4 (Service Pack 5) slows down to a crawl. With 2.0.37, the problem was less severe (6kbytes/sec). Andi Kleen included a log from tcpdump of a slow connection. This helped Andi see that NT was taking a lot of time to ACK a packet, which was causing Linux back to throttle. Solved: false alarm! It wasn't Linux' fault at all. It turned out that NT had to be told not to use full duplex mode with the ethernet card. Kernel issue #7: Scheduler



Phil Ezolt (22 January 2000 in linux_kernel): Re: Interesting analysis on linux kernel threading from IBM: When I run SPECWeb96 test here, I see both a large quantity of running processes as well as a lot context switches. ... Here's a sample vmstat data: procs Memory Swap io System Cpu r bw swpd-free buff cache sio bi bo in csus us sy ID... 24 0 02320 2066936 590088 1010664 0 03961 24 0 02320 2065752 590664 1061064 0 03961 1 Notice. 24 running processes and 7000 context switches. That is a lot of overhead. Every second, 7000*24 goodnesses is calculated. Not the (20*3) desktop system sees. This is a matter of scalability. A better scheduler means better scalability. Don't tell us that benchmark data is ineffective. If you are unable to give me data using real systems and tell me where the faults are, then benchmark data will be useless. SPECWeb96 pushes Linux til it bleeds. I'm telling you where it bleeds. You have two options: fix it or bury yourself in the sand. It might not be what your system sees today, but it will in future. Would you rather fix it now or wait until someone else how thrown down the performance gauntelet? ... Here's a juicy fact. During my runs you will see 98% contention on [2.2.14] kernel lock. It is accessed a lot. I don't know how 2.3.40 compares, because I don't have big memory support for it. Hopefully, Andrea will be kind enough give me a patch, and then I can see if things have improved. [Phil's data pertains to the webserver that was subject to the SPECWeb96 test. It is an ES404 CPU alpha EV6 running Redhat 6.0 w/kernel v2.2.14 w/SGI speed patches; the interfaces taking the load are 2 ACENic gigabit ethernetcards. Kernel issue #8. SMP bottlenecks within 2.4 kernel



Manfred Spraul, April 21, 2000, in linux-kernel ( [PATCH] f_op->poll() without lock_kernel()): [email protected] noticed that select() caused a high contention for the kernel lock, so here is a patch that removes lock_kernel() from poll(). [tested] with 2.3.99.pre5. There was some discussion over whether this was the right decision at this late stage, but Linus Miller and David Miller were enthusiastic. Another bottleneck appears to be in the way. On 26 April 2000, [email protected] posted benchmark results in Linux-Kernel with and without the lock_kernel() in poll(). The followups included a kernel patch to improve checksum performance and a patch for Apache 1.3 to force it to align its buffers to 32-word boundaries. The latter patch, by Dean Gaudet, earned praise from Linus, who relayed rumors that this can speed up SPECWeb results by 3%. This thread was very interesting. This thread was interesting. Kernel issue #9: csum_partial_copy_generic



[email protected], 19 May 2000, in linux-kernel ( [PATCH] Fast csum_partial_copy_generic and more ) reports a 3% reduction in total CPU time compared to 2.3.99-pre8 on i686 by optimizing the cache behavior of csum_partial_copy_generic. The workload was ZD's WebBench. He adds The benchmark we used has almost same setting as the MINDCRAFT ones, but the apache setting is [changed] slightly not to use symlink checking. We used only 24 clients independent of each other and there were 16 apache processes.

Four-way XEON processor systems are used. The performance is twice that of a single CPU. Note that in ZD's benchmarks with 2.2.6, a 4 CPU system only achieved a 1.5x speedup over a single CPU. Kumon reports a >2x speedup. This seems to be similar to the speedup NT4.0sp3 achieved using 4 CPUs with 24 clients. It's encouraging to hear that things may have improved in the 11 months since the 2.2.6 tests. Kumon indicated that major improvements were made between pre3 & pre5, poll optimization. Until pre4 (I forget exact version), kernel-lock prevents performance improvement. The following mails will help to understand the background if you can retrieve lk mails between Apr 20-25. subject: namei() query subject: [PATCH] f_op->poll() without lock_kernel() subject: lockless poll() (was Re: namei() query) subject: "movb" for spin-unlock (was Re: namei() query)

Kumon posted again on 4 September 2000, noting that his changes hadn't been implemented into the kernel. Kernel issue #10 - getname(), poll() optimizations



Manfred Spraul posted a patch for linux-kernel, 22 May 2000. This optimized kmalloc(). getname() and select() a little, speeding apache up by 1.5% on 2.3.99.pre8. Kernel issue #11 - Reducing lock contention and poll overhead in 2.4



Alexander Viro posted a patch on 30 May 2000 that removed a large lock in close_flip(), and _fput(). He requested testing. Kumon ran a benchmark. He reported: I measured viro’s ac6D patches with WebBench using the 4cpu Xeon systems. I applied to 2.4.0.test1 and not ac6. The patch decreased stext_lock time by 50% and OS time by 4%. ... Some part of kmalloc/kfree overhead is come from do_select, and it is easily eliminated using small array on a stack. Kumon then posted a patch which avoids kmalloc/kfree for select() and poll() when the number of fd's is less than 64. Kernel issue 12: Poor disk search behavior in 2.2. New elevator code in 2.5



On 20 July 2000, Robert Cohen ([email protected]) posted a report in Linux-kernel listing netatalk (appletalk file sharing) benchmarks comparing 2.0, 2.2, and several versions of 2.4.0-pre. The elevator code in 2.4 seems helpful (some versions can handle 5 benchmark clients instead o... The test4 version and test5pre2 have not fared as well. They manage 2 clients on a 128Meg server well, so they're doing much better than 2.2. But they choke and go to seek bound with 4 clients. It's clear that things have changed since test1–ac22. Here's a new update. The *only* 2.4 kernel versions that could handle 5 clients were 2.4.0-test1-ac22-riel and 2.4.0-test1-ac22-class 5+; everything before and after (up to 2.4.0-test5pre4) can only handle 2. Robert Cohen posted a patch on 26 Sept 2000. It included a simple program that demonstrated the problem. Jens Axboe, [email protected], replied that Andrea and him had a patch almost ready in 2.4.0.test9-pre5 to fix this problem. Robert Cohen posted an updated on 4/10/2000 with benchmark results for many Kernels, showing that the issue still exists in 2.4.0.test9. Kernel issue #13: Fast Forwarding / Hardware flow control



On 18 Sept 2000, Jamal ([email protected]) posted a note in Linux-kernel describing proposed changes to the 2.4 kernel's network driver interface; the changes add hardware flow control and several other refinements. He says Robert Olson and I decided after the OLS that we were going to try to hit the 100Mbps(148.8Kpps) routing peak by year end. I am afraid the bar has been raised. Robert is already hitting with 2.4.0 test7 148Kpps, using an ASUS motherboard carrying PIII 700MHZ coppermine with approximately 65% CPU utilization. I was able get a consistent value in the 110Kpps range with a single PII-based Dell computer. As an example, I have attached a modified tulip drivers (hacked by Alexey and mod'ed and modified by Robert over a period) to show how feedback values can be used. ... I believe we could have done better with the mindcraft tests with these changes in 2.2 (and HW FC turned on). [update] BTW, I am informed that Linux people were _not_ allowed to change the hardware for those tests, so I dont think they could have used these changes if they were available back then. Kernel tuning issue: hitting TIME_WAIT



Takashi RichardHorikawa posted a report on Linux-Kernel on March 30, 2000 that listed SPECWeb96 results for both 2.2.14 (and 2.3.41). Performance between a Client and Server running 2.2.14 was poor. This is because too few ports were being used, so TIME_WAIT was not used by ports. The moral of the story may be to tune the client and servers to use as large a port range as possible, e.g. with echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range to avoid bumping into this situation when trying to simulate large numbers of clients with a small number of client machines. On 2 April 2000, Mr. Horikawa confirmed that increasing the local port range with the above command solved the problem. Suggestions regarding future benchmarks



Become familiar with linux-kernel, the Apache mailing lists, and the Linux newsgroups of Usenet (try DejaNews power search in forums matching *linux *').). See if people agree with your configuration. Also, be open about the benchmark. You can post intermediate results and ask for suggestions. You can expect to spend a week thinking about these ideas with these mailing lists throughout the course of your tests. If possible, use a modern benchmark like SPECWeb99 rather than the simple ones used by Mindcraft. It might be interesting to inject latency into the path between the server and the clients to more realistically model the situation on the Internet. If possible, benchmark single CPUs and multiple CPUs. Be aware that the networking performance of version 2.2.x of the Linux kernel does not scale well as you add more CPUs and Ethernet cards. This is mostly true of static pages and cached Dynamic pages. Noncached dynamic Pages take a lot of CPU-time and should scale well with additional CPUs. To save frequently generated pages, caches can be used to speed up dynamic page speeds. When testing dynamic content: Don't use the old model of running a separate process for each request; nobody running a big web site uses that interface anymore, as it's too slow. Use a modern interface for dynamic content generation (e.g. Apache mod_perl Configuring Linux



Tuning problems probably resulted in less than 20% performance decrease in Mindcraft's test, so as of 3 October 1999, most people will be happy with a stock 2.2.13 kernel or whatever comes with Red Hat 6.1. When the 2.4 kernel is available, it will improve SMP performance. If you're interested in seeing what people were doing in June, here are some notes: - Linux kernel 2.2.9 and 2.2.9_andrea3 have been praised for their performance on a dual processor task as of June 1 (see above). (2.2.9_andrea3 seems to include both a wake-one scheduler fix as well as an SMP unlock_kernel fix.) (andrea3 works only on x86. PPC's and Alphas will need to apply another wake-one or tcp copy kernel_unlock fix. Jan Gruber writes: "the 2.2.9_andrea3-patch doesn't compile with SMP Support disabled. Andrea told me to use ftp://ftp.suse.com/pub/people/andrea/kernel-patches/2.2.9_andrea-VM4.gz instead." Andrea Arcangeli asked me on 7 June: If you plan to bench, would you also bench the patch below? ftp://e-mind.com/pub/andrea/kernel-patches/2.2.9_andrea-perf1.gz - On 11 Oct 1999, Andrea Arcangeli posted his list of pending 2.2.x patches, waiting to go into 2.2.13 or so. These patches might improve performance of SMP system and systems that are subjected to heavy I/O. These might be worth considering if you encounter bottlenecks. - For those who are truly brave, you might want to use the kernel-mode http servers, khttpd as a front-end to Apache. It speeds up static web page fetches tremendously. It's at version 0, so be careful. - linux kernel (week 1, week 2) is currently discussing Apache benchmarking. Linus Torvalds is generally positive about using khttpd, or something similar, and points to the fact that NT is doing exactly the same thing. Configuring Apache



- The usual optimizations should be applied (all unused modules should be left out when compiling, host name lookup should be disabled, and symbolic links should be followed; see http://www.apache.org/docs/misc/perf-tuning.html) - Apache should be compiled to block in accept, e.g. env CFLAGS='-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT' ./configure - The http://www.arctic.org/~dgaudet/apache/1.3/top_fuel.patch may be worth applying. PC Week used top_fuel to benchmark their latest benchmarks. (See also Dean Gaudet's comments in linux kernel and new-httpd. Supposedly, applying top_fuel.patch and using mod_mmap_static on a set of documents can reduce the number of syscalls per request from 18 to 9. - For static file benchmarks, try compiling mod_mmap_static into Apache (see http://www.apache.org/docs/mod/mod_mmap_static.html) and configuring Apache to memory-map the static documents, e.g. by creating a config file like this: find /www/htdocs -type f -print */mmapfile &/' > mmap.conf and including mmap.conf in your Apache config file. Squid is a front-end to Apache that has been mentioned by many people. It would greatly accelerate static web page fetches.

Similar reading

- A few Usenet posts showing people experiencing slowness with Apache or Linux: "Apache is not as fast as people claim?? ", 1999/04/05, comp.infosystems.www.servers.unix "...when we run WebBench to test the requests/sec and total throughput, Microsoft IIS 4 is 3 times faster for both Linux and Mac OS X." "Re: Apache vs IIS 4: IIS 4 3 times faster", 1999/04/02, comp.infosystems.www.servers.unix "Why are you surprised? I assumed Apache was slow. I haven’t tested IIS but I did compare Apache to a few other servers last year. I found some that were three to four times faster. You can profile the kernel using Kernel Spinlock Metering Linux IA32. This tool measures SMP spinlock contention. See also some test results comparing 2.2 to 2.3. An example of someone using spinlock measurement to find and fix kernel bottlenecks in 2.3.19. Andrea Arcangeli's ikd sgi's gprof kernel profiling patch (original announcement) Ingo Molnar's ktracer - for 2.1.x Example of ktracer use Example of both ktracer and ikd profile output - Christoph Lameter's perfstat patch, at Captech's Linux Performance, Stability and Scalability Project -- see also their 25 Oct 99 post on linuxperf Ways to profile user programs: - The old favorite: compile with -pg, and analyze gmon.out with gprof. Mikael Pettersson's x86 performance-monitoring counters patch. Supports 2.3.22, 2.2.13. List of related tools. David Mentre PCL - Performance Counter Library - How to use hardware performance counters with Linux Stephan Meyer's MSR patch -- only supports up to 2.2.6. No longer being actively developed. Richard Gooch's MSR patch and PTC patch -- only supports version 2.2. Requires devfs. A few linux-kernel posts: "2.2.5 optimizations for web benchmarks? ", 16 Apr 1999 -- Karthik Prabhakar, about to do serious SPECWeb96 benchmarking, asks the right questions. The followups can be very interesting. "Re: 2.2.5 optimizations of web benchmarks? Dean Gaudet's response. An Apache insider shares some interesting insights. "[patch] New Scheduler", 9 May 1999 -- Rik van Riel started the thread about possible scheduler changes. The smbtorture Benchmark lets you test an SMB Server like the big boys Rik Van Riel's Linux Performance Tuning Site The Linux Scalability Project C10K Problem - Why can't Johnny service 10000 clients? Banga and Druschel's paper on web server benchmarking Linus's "State of Linux" talk at Usenix '99 where he talks about the Mindcraft benchmark and SMP scalability. my NT vs. Linux Server Benchmark Graphs page A post on comp.unix.bsd.freebsd.misc from June '99 which mentions that FreeBSD also has similar SMP scaling properties as Linux on tests like those run by Mindcraft. Mike Abbott from SGI's performance patches to Apache 1.3.9. Note: Apache 2.0 supports sendfile() which should help its flat file performance.