IA-32 and Intel®64 Processor Targeting Overview

IA-32 and Intel®64 Processor Targeting OverviewThe compiler documentation lists many options for optimizing for particular processors or processor families. Some of these are duplicates or older options that are maintained for reasons of compatibility with other or older compilers, which can be confusing. This article tries to summarize the relationships between different switches, and explain which are the most important and useful in practice.There are two main categories: the first is of microarchitecture-related switches that generate code that runs fast on some processors or processor families, but does not run at all on others. These typically make use of additional instruction sets that are not available on all processors. This is much the most important category. The second category is of tuning switches: these may also generate code that runs faster on some processors, but the code will run successfully on all processors. They typically involve more subtle scheduling decisions and do not invoke additional instruction sets. An example might be for multiplication by a power of 2. On some processors, an integer multiply may be best; on others, a shift operation might be faster, but all processors support both types of instruction. By default, the compiler typically tunes for a blend of recent processors.Category 1a: microarchitecture-specific/arch:… (-m… or -arch…) Optimizes for both Intel and compatiblenon-Intel processors that support the specified instruction set. On other processors, may resultin an illegal instruction error at runtime./Qx… (-x…) Optimizes for Intel processors that support thespecified instruction set. On other processors, gives a runtime error explaining that the executable was not built to run on this processor. -march… Optimizes for some limited combinations ofprocessors and instruction sets. Not recommended.Category 1b: fat binaries (microarchitecture-specific code, but also an alternative default code path that should work for most or all processors)./Qax… (-ax…) Generates one default code path, optimized for any Intel or compatible non-Intel processor that supports SSE2 instructions, and an additional code path (or paths) that supports the corresponding instruction set(s).Category 2: tuning only, no extended instruction sets/tune:… -tune… (Fortran only) tuning switch kept for makefile compatibility.Does not currently influence generated code./G… (-mtune…, -mcpu…) tuning switch kept for makefile compatibility.Does not currently influence generated code.See the main compiler documentation for the possible arguments taken by all of the above switches.Recommendations:The recommended processor specific switch to optimize for a specific Intel processor is /Qx… (-x…). The recommended processor specific switch to optimize for a specific Intel or compatible non-Intel processor is /arch:… (-m…). To generate optimized code paths for one or more specific Intel processors, in addition to a default optimized code path for an Intel or compatible non-Intel processor, try the switch /Qax… (-ax…). The properties of the default code path can be modified by using the /Qx… (-x…) or /arch:… (-m…) switch in conjunction with the /Qax… (-ax…) switch. The use of category 2 “tuning only” switches is not recommended with the current generation of compilers.

0 comments:

Post a Comment