Optimizing PennMUSH



What's this page for?

This page covers some hints for optimizing the PennMUSH binary for maximum performance, in the rare case that it be needed. Everything assumes the most common situation - Using the gcc compiler on a x86-compatible computer. If that's not the case for you, refer to yor compiler documentation for specific details. Some of the stuff here will still be useful.

You probably don't need any of this. Computers are so powerful, and Penn is not a CPU-intensive program most of the time, that you can run lots of games on one computer with just the default optimization switches. Memory is by far the limiting hardware factor. But, if you still want to get the most bang for your buck, read on.

The basics

If you can, get and install gcc 3.1 or 3.0.4 from http://gcc.gnu.org. Version 2.95 is acceptable, but the 3 series is better. If you're going to be making it your primary compiler, I suggest sticking with the 3.0 seriues until 3.1.1 comes out. Installing it seperately is fine, though.

Your Makefile is where all compiler switches are set, in the CFLAGS line. It's usually a good idea to edit Makefile.SH as well, so that re-running Configure won't cause you to loose everything.



The first thing to do is say good-bye to any debug information. If you need debug information, you're not going to get fast code, but you can still do some things to help. Read the rest of this section and the next one. PennMUSH itself is quite stable, so you probably don't need the debug stuff unless you're actively changing stuff or using 1.7.5, the unstable development version.

To tell the compiler to stop adding debug information, remove the -g option from CFLAGS. Then, in its place, add -fomit-frame-pointer. This tells gcc to free up another of the x86's limited number of registers for general use, and is probably the single greatest improvement you can have, as gcc likes lots of registers. If you do want debug info and you're using gcc 3.X, you can keep the -g option and add -momit-leaf-frame-pointer. This gives you the advantages of -fomit-frame-pointer in some cases while still keeping useful debugging information.

Machine type

Next, you need to tell gcc what model of x86 CPU you're using, with the -march= switch. For Pentium Pros, PIIs, Celerons, use -march=i686. For plain Pentiums, -march=i586. For AMD K6, -march=k6. For PIIIs and IVs, gcc 3.1 supports -march=pentium3 and -march=pentium4. For older versions of gcc, use i686 with these. For Athlons, gcc 3 supports -march=athlon. Otherwise use i686. Gcc 3.1, in addition, supports sub-types of these processors, including pentium-mmx, k6-2, k6-3, pentium2, athlon-tbird, athlon-4, athlon-xp, and athlon-mp. Use the most appropriate one depending on your chip and compiler version.

This tells gcc to use additional instructions added in the various models, and (In theory) instruction ordering rules to get the best performance for that model. pgcc does a better job at this than gcc 2.95, but neither one are particularly stellar compared to, say, Intel's compiler. But then, you can't use Intel's compiler to target anything but Intel and AMD chips, while gcc can compile programs for practically anything. gcc 3, especially 3.1, is getting better at this.

Floating-point numbers are handled faster if aligned on a two-word boundry. However, the old 386 binary interface doesn't do this, which slows code down. Luckily, you can make gcc do the proper alignment in some cases, with -malign-double. pgcc also has a -mstack-align-double, which should be used as well if you're using it. All other alignment things should be taken care of with the -march option. I think. If you're using gcc 3.1 and have a chip that can handle SSE and SSE2 instructions, you can also try -mfpmath=sse.

Chips that support SSE or 3DNow! instructions have support for something called prefetching, where memory can be loaded into the chip cache before it's actually used, for faster access when it is. When using gcc 3.1, you can make it do this at times with -fprefetch-loop-arrays. This only works if your arch setting indicates the right kind of CPU; otherwise it does nothing.

General optimization

Next, change the -O option to -O3 (For gcc) or -O6 (For pgcc). These turn on many more optimizations.Then, add -ffast-math. This stops some checks in floating-point math routines (Like making sure sqrt()'s argument isn't negative) that we already handle, and these dropped checks mean more speed. This would be much more noticable if mushes did a lot of number crunching.


For gcc 2.95: CFLAGS=-O3 -march=i686 -fomit-frame-pointer -ffast-math -malign-double

For gcc 3.0, with better optimization but still debugging information: CLFAGS=-O -march=i686 -momit-leaf-frame-pointer -malign-double

For gcc 3.1: CFLAGS=-O3 -march=athlon -fomit-frame-pointer -malign-double -fprefetch-loop-arrays


Last modified: Sun Jun 2 11:07:55 PDT 2002

Valid XHTML 1.0! Valid CSS!