Intel finally gets it right
The Core 2 Duo is the product of many years of innovation at Intel, but it is also the redemption for an equal number of years of bad decisions and embarrassing mistakes in engineering, marketing, and planning.
When the socket 370 Pentium 3 Tualitin core had reached the apparent limitations of its architecture, Intel developed two different upgrade paths for it -- the Pentium M for laptop computers, which explored new ways of adding performance without having to increase the CPU's frequency much; and the Pentium 4, which was a totally new design for desktop systems that made many sacrifices to continue Intel's emphasis on raising the frontside bus frequency to achieve better performance. While there were a few abject failures along the way -- the Mobile Pentium 4 and the Mobile Pentium 4 M to be specific -- the Pentium M developed into the premiere laptop CPU because of its reasonable power consumption and excellent performance.
Around the same time, Intel began making wireless network chips and developed the Centrino brand to represent a laptop computer that had both an Intel Pro Wireless (IPW) network chip and a Pentium M processor. Meanwhile, the Pentium 4 stayed competitive with the AMD Athlon XP (which used easily understood performance ratings while Intel stubbornly insisted on the more traditional and well-known frequency ratings) and hugely successful Athlon 64 processors by offering enhanced multimedia extensions in the form of SSE, SSE2, and SSE3, and greater memory bandwidth (due to the higher frontside bus frequency). But the Pentium 4 architecture was pushed too hard and too far; Intel stretched it to its architectural limits and beyond, with each new core design sucking down embarrassing amounts of electricity while requiring increasingly drastic cooling measures.
When it became impossible to increase the CPU's frequency any more, Intel added more cache memory, Hyper-Threading Technology, and more multimedia functions, and switched from megahertz ratings to processor model numbers, which confused many customers who didn't understand the new chip rating methods or numbers. At the same time, fierce competition from AMD's 64-bit desktop processors significantly eroded Intel's market share, and the future began to look dim for the semiconductor company that made desktop computing popular.
The success of the Pentium M in laptop systems, and the increasingly obvious heat dissipation, power consumption, and performance limitations of the Pentium 4 architecture forced Intel to plot a dramatically different course for its desktop processors. That led to the introduction of the Core Duo, the logical next step for the Pentium M architecture. Instead of the unsuccessful Hyper-Threading Technology, the Core Duo had two discrete processing cores and a respectable amount of cache memory, but it was still 32-bit in a world that is rapidly becoming 64-bit, and it wasn't available for desktop systems (outside of Apple). The Core Duo exists only in a socket M package, which is not used in desktop motherboards, but is frequently found in laptop systems and has been the crown jewel in Apple's new Intel-based Macintosh computers.
And now we have the Core 2 Duo, the logical evolution of the Core Duo, available in a desktop FC-LGA 775 package. It takes the outstanding Core Duo laptop processor architecture and the high-end features of the Pentium D and combines them to achieve high performance while keeping heat generation and power consumption (the two being intrinsically linked) at a tolerable level.
After many years, thus ends the reign of the Pentium brand. Personally, I'm glad to see it go -- I think it's much easier to keep track of processor architectures and formfactors when each distinct generation has its own brand identity.
The 64-bit advantage
The performance advantage of 64-bit x86-64 processors is something I have written much about since the introduction of the first Athlon 64 CPU. Here's the condensed version:
The Core 2 Duo, like the Pentium D before it, is based on Intel's Extended Memory 64 Technology (EM64T), also called AMD64 by AMD and known generically as x86-64. Basically it is the old x86 architecture (called "general purpose instructions" and commonly referred to as the IA32 instruction set architecture (ISA)) plus the old x87 floating point instructions (which are deprecated now but still used by some older 16- and 32-bit programs), 64-bit media instructions (ala MMX and 3DNow!) and these significant enhancements:
- Increased number of general purpose registers
- 64-bit addressing
- 128-bit (SSE, SSE2, SSE3) media instructions
- Improved physical and virtual memory management
The EM64T ISA includes twice as many general purpose registers as the old x86 design, and all of them are twice as wide due to 64-bit addressing (as opposed to 32-bit). The instruction pointers (a pointer is a variable that contains an address rather than data) also increase from 32 to 64.
Having more and wider general purpose registers means that memory can be used much more efficiently and memory traffic can be minimized, which in turn allows compilers to compile programs to work much faster on your machine.
64-bit addressing means that the physical memory limitation rises to 1TB (that's 1000GB) from the 32-bit limit of 4GB. The processor can also work with longer instructions. To really notice this advantage, you have to stress the system to a degree that most desktop users don't with current software, but as desktop applications demand more from processing hardware, this advantage will become much more important. The advantages to 64-bit addressing in a workstation or server machine are more obvious as they regularly deal with CPU-intensive work.
128-bit media instructions refer specifically to Intel's SSE, SSE2, and SSE3 (Streaming SIMD -- Single Instruction Multiple-Data -- Extensions) technologies. These instructions are very useful for working with large blocks of data, which benefits anyone who deals with a lot of scientific data or high-performance media (streaming high-resolution video, image processing, 3D rendering, and speech recognition) or anything that uses floating-point math.
EM64T deals with both physical and virtual memory in a much more sensible manner than x86, treating the entire virtual memory space as one unsegmented block and eliminating a lot of translation layers from the process of addressing physical memory. Previously x86 would segment virtual memory into small blocks for use with different programs and functions, but this ended up being inefficient and rarely used by software. EM64T eliminates that inefficiency by letting the software choose how it will handle virtual memory (which it does anyway, even if the virtual memory is segmented). This translates into lower latency and faster performance when dealing with both physical and virtual memory.
Performance and enhanced capabilities aside, the most valuable feature of the EM64T is its ability to run 32-bit x86 binaries without a separate processor or operating system (though 64-bit operating systems will need to have 32-bit compatibility libraries installed in order to use 32-bit programs). This makes it much easier to slowly transition from a 32-bit to a 64-bit environment without having to change software applications. While AMD64 and EM64T processors are still very fast while in 32-bit mode, you won't be able to take advantage of any of the above-mentioned features and expanded resources (with the exception of the SSE/SSE2/SSE3 instructions) if you're running a 32-bit operating system. Even if you're going to be running 32-bit binary programs, it pays to have a 64-bit operating system underneath them so that the rest of the system can run more efficiently.
Performance
The first thing you should know (especially if you skipped the previous section on how 64-bit computing makes a difference) is that the performance difference between running a fully 64-bit operating system and a 32-bit operating system can be anywhere from nothing (if you don't do any real computing) to a 100% increase or more (if you do any video or audio encoding or other things that require a powerful CPU and a lot of RAM). Secondly, if you're serious about 64-bit computing, you're going to have to kiss Windows goodbye -- it has the industry's lousiest EM64T/AMD64 support. I highly recommend Mandriva Linux PowerPack Edition for desktop use and OpenBSD for servers.
So let's take a look at how the new Core 2 Duo performs. I hate synthetic benchmark tests and unless there's a very good reason to use them, I will always publish real-world benchmarks instead. Below you'll see two programs in use: oggenc, which encodes WAV audio files into the OGG/Vorbis format (you can expect similar results for MP3 encoding/decoding), and OpenSSL, which you use every time your computer needs to encrypt data (such as through a secure Web form). The OpenSSL speed tests are limited to AES, which is the one of the best indicators of 64-bit performance among all of the OpenSSL ciphers. The Athlon 64 X2 and Core 2 Duo tests were run from the 64-bit edition of Mandriva Linux 2007, and the Pentium D and Pentium 4 tests were run in 32-bit Gentoo. The Pentium D also had twice as much RAM as the other test machines. Many would say that this is an unfair comparison; I agree to a certain extent, but the Pentium D is a production machine that I could disconnect for a few minutes to run these tests, so that's the way it has to be. The 820 and the P4 3.2E are also low-end in their respective classes, so we're not really comparing the best-of-breed here. What we are doing is showing how much faster the Core 2 Duo is than three systems that you might reasonably have on your desk right now.
The OpenSSL numbers are in thousands of bytes per second processed, so higher numbers mean greater performance. The oggenc test results were measured with the standard Unix/BSD/GNU time
command, and the test data was The Doors' LA Woman album, ripped with cdparanoia -Bw
with default settings from the command line. Lower times mean higher performance.
OpenSSL running on an Athlon 64 X2 3800+ processor in an Asus A8N-E motherboard with a Corsair TwinX LL DDR-400 1024MB set (two tested 512MB modules):
Cipher | 16 bytes | 64 bytes | 256 bytes | 1024 bytes | 8192 bytes |
aes-128 cbc | 83711.88k | 86110.78k | 87626.25k | 88013.83k | 88133.09k |
aes-192 cbc | 75244.86k | 76571.56k | 77698.76k | 77985.74k | 78074.84k |
aes-256 cbc | 67943.89k | 68785.95k | 69749.08k | 69994.72k | 70070.49k |
OpenSSL running on an Intel Pentium 4 3.2E in an Intel D875PBZLK motherboard with Corsair TwinX-LL DDR-400 1024MB kit (two tested 512MB modules):
Cipher | 16 bytes | 64 bytes | 256 bytes | 1024 bytes | 8192 bytes |
aes-128 cbc | 79154.59k | 78949.15k | 81544.36k | 81947.53k | 82485.71k |
aes-192 cbc | 70636.01k | 70085.72k | 71894.33k | 72926.05k | 73213.11k |
aes-256 cbc | 62936.16k | 63083.62k | 64930.03k | 65524.64k | 65603.78k |
OpenSSL running on an Intel Pentium D 820 in an Asus P5WD2 motherboard with four Crucial DDR2-533 512MB modules:
Cipher | 16 bytes | 64 bytes | 256 bytes | 1024 bytes | 8192 bytes |
aes-128 cbc | 75613.37k | 104218.54k | 116415.31k | 119441.27k | 122052.61k |
aes-192 cbc | 67167.84k | 89894.06k | 100509.44k | 103326.02k | 103768.06k |
aes-256 cbc | 61230.37k | 79156.29k | 87579.98k | 88962.73k | 90608.98k |
OpenSSL running on an Intel Core 2 Duo E6700 in an Asus P5B motherboard with two 512MB GEiL PC2-5300 RAM modules:
Cipher | 16 bytes | 64 bytes | 256 bytes | 1024 bytes | 8192 bytes |
aes-128 cbc | 131805.21k | 140084.86k | 141458.46k | 142305.62k | 140692.14k |
aes-192 cbc | 115836.18k | 122063.18k | 123798.27k | 124110.85k | 122719.43k |
aes-256 cbc | 103268.94k | 108615.40k | 109983.49k | 109789.47k | 109510.66k |
Now for the oggenc tests. All time is listed in seconds and each number represents the real time (the total elapsed time), user time (the time it takes to execute the utility), and system overhead time:
Oggenc Real Time |
|
The User and System times are not as important as the Real time listed above; the Real time is the total time elapsed for the test, and it's really the only time that matters to you as an end user. A high System time can mean inefficiencies in the hardware design or a lack of optimization in the operating system.
Oggenc User Time |
|
Oggenc System Time |
|
So the Core 2 Duo E6700 on a 64-bit operating system is roughly twice as fast as the Pentium 4 3.2E and the Pentium D 820 on 32-bit operating systems, and about 50% faster than the Athlon 64 X2 3800+ on a 64-bit OS -- that's incredible. Intel has not had a CPU this much faster than the previous generation since the introduction of the original Pentium processor. It doesn't quite bury the Athlon 64 X2 as badly, but it's still significantly faster. Also, I'll mention once again that the three CPU's being compared to the E6700 are the slowest in their class, but they are processors that you could reasonably expect to find in many of today's home computers -- even new ones.
Power consumption
Below are power usage tests that compare the Pentium D 820, AMD Athlon 64 X2 3800+, and the Core 2 Duo E6700 (the Pentium 4 system is long gone, and at the time that I had it, I didn't do power consumption tests). Both the Pentium D and the Athlon 64 X2 systems used an Asus P5WD2 or A8N-E motherboard; an Antec TrueBlue 480 power supply; 1GB of either DDR2-533 or DDR400 RAM in two modules; one Seagate SATA-V hard drive; a Matrox G550 1X PCIe video card; and a Lite-On 52X CDRW/DVD-ROM drive. The Core 2 Duo used an Asus P5B motherboard with 1GB of GEiL DDR2-800 RAM in two modules, with all other components being the same.
Electricity consumption is generally measured in kilowatt hours (KWh), though my test period was only 15 minutes, so the actual power usage is measured in watt hours, while the projected monthly usage (assuming 24x7 operation with frequent periods of inactivity) is in kilowatt hours. I measured electricity usage via the Watts Up Pro, a watt meter designed specifically for measuring power consumption of consumer devices. I calculated the costs based on the average for the state of Florida in the year 2002: $0.0731 (a little more than seven cents per KWh). That price has gone up dramatically since then, but the point is to show approximate differences, not precise estimates.
CPU (with the rest of the system) | Watt hours | Average monthly KWh | Average monthly cost | Min/Max watts measured |
AMD Athlon 64 X2 3800+ | 28.4 | 81 | $5.91 | 95/168 |
Intel Pentium D 820 | 42.6 | 123 | $8.97 | 156/264 |
Intel Core 2 Duo E6700 | 40.6 | 117.0 | $8.55 | 133.4/196.6 |
Again -- the Pentium D and Athlon 64 X2 are the slowest, lowest-power CPUs in their classes, and the high-end models can draw as much as three times more electricity. The E6700 is the top CPU in its class as of this writing, and it still draws less power than its desktop predecessor while doubling performance. This is, to my knowledge, the first time Intel has ever successfully accomplished this feat with a desktop CPU.
Secondly, for being 50% less powerful, the Athlon 64 X2 3800+ is drawing a lot more than half the electricity of the Core 2 Duo. That makes the Core 2 Duo the most efficient high-end desktop processor I've ever tested.
Motherboard and chipset compatibility
There are now several motherboards that can accommodate either a Pentium D or a Core 2 Duo processor. Some require a BIOS upgrade, though, and to do that you need a CPU that the motherboard can work with. In other words, you'd need a Pentium D processor to boot with, then you'd have to upgrade the BIOS, then you could install a Core 2 Duo. This option is only viable for people who own these motherboards and wish to upgrade from a Pentium D to a Core 2 Duo. Check your motherboard documentation and the manufacturer's Web site carefully before you buy anything. In some cases, a newer revision of the BIOS is already installed on the board. If you buy from a local computer parts store, you can check the motherboard revision number in person before you buy it, or possibly have store technicians update the BIOS for you before you take the board home. It's a bad idea to buy a board from an online retailer if you need a specific revision -- you're likely to get the wrong one, and online retailers are unlikely to be sympathetic to your situation.
RAM support varies, but as of this writing it's all DDR2 across the board. Some chipsets can do DDR2-533 and some can do DDR2-667 or DDR2-800. The actual frontside bus (the physical pathway between the RAM and the CPU on Intel systems) operates at a frequency of 1066Mhz, so that should be the theoretical limit of the RAM frequency as well.
As far as operating system support is concerned, you definitely want a 64-bit OS for maximum performance. Forget Windows XP -- if you need to stay with Windows, it'll have to be Windows Vista. Otherwise, give GNU/Linux a shot. If maximum performance doesn't matter to you, you're probably reading the wrong article.
All modern operating systems will easily support dual-core processors, and all the ones worth using will also be 64-bit.
0 comments:
Post a Comment