32-Bit Arrives
Here we will look at the beginning of the 32-bit CPU era, where the 386 and 486 were king.
Intel 80386
The first 32-bit processor (i.e. the width of general purpose registers was 32 bits) to be used in a PC was the 80386DX, classified as a third generation processor. It was released in 1985 with a clock speed of 16MHz, although later editions reached higher speeds. The data bus and address bus widths were both increased to 32 bits, allowing a memory fetch of four bytes at a time, and permitting access to huge 4Gb of memory.

The 386DX significantly outperformed the 286, partially through increased clockspeeds (25 and 33MHz versions of the 386DX were common), but also through introduction of a pipelined architecture; the 386DX pipeline was 4 steps deep. (For a description of pipelining, check out the CPU Optimisation page.) There was no built in floating point unit, but (as with previous chips) an optional coprocessor could be used to handle x87 instructions: the 80387. The increased speed available through the 386 paved the way for multitasking GUI operating environments on the PC. Microsoft Windows 3.x became a popular working environment. While still sitting on top of DOS, Windows made use of protected mode and increased memory availability.
In the late 80s, towards the end of the lifespan of the 386DX, Intel released a 'cut down' version of the chip, called the 386SX. The only real difference was that the SX version used a 16-bit data bus. While still a 32-bit processor internally, data transfer in and out of the CPU happens at half the speed of the DX, resulting in an estimated overall reduction in performance of approximately 25%. This chip was intented at a slightly more 'budget' market. Actually, my first ever PC was a 386SX-25.
The 386 era also saw a change from the de-facto IBM PC to the generic 'PC clone' market. No longer was it so important to have a branded 'box', as people began to realise that it was the combination of components inside the box that mattered, not the label on the front.
Intel 80486
Intel's 4th generation CPU, the 486DX - released in 1989 - was nearly twice as fast as an equivalently clocked 386DX. However, this performance gain was not achieved through increases in data bus or address bus width. Instead, many architectural changes were made within the chip to optimise how instructions were fetched from memory and subsequently executed. Indeed, the 486 shows considerable RISC traits and implements many performance enhancing features that are common to RISC-based technology. The greatly increased internal complexity of the chip is clear from the fact that it is made from over 1 million transistors (some four times as many as were used in the 386 chips).
The pipeline was increased to 5 steps. A floating point unit was integrated into the core (although not in the later 'budget' 486SX version). Also, the ability to use burst mode on memory access was implemented. Basically, this means that multiple reads of memory are done following a single memory address generation, greatly speeding up the access of memory.
Perhaps the greatest performance gain was achieved through the use of level 1 cache (L1, aka 'internal' cache) memory: 8kB of L1 cache memory is located on the chip itself. Instructions fetched from main memory could be cached here, significantly reducing the wait time for a typical instruction. Some 90% of instructions could be fetched from the fast L1 cache, rather than from the slow system memory. Additionally, motherboard architecture became more complicated at this time, and some motherboards made use of off-die (i.e. not on the chip itself) level 2 (L2) cache.
Why is cache so efficient? Well, it turns out that for any program running in memory, only a small fraction of the instructions in the program are performed frequently. Cache works by storing these frequently and recently used instructions. Thus, when the CPU needs to perform such an instruction again, it can find it in the fast local cache and therefore not have to wait for the slow access to main memory.
Why is cache so fast? Firstly, cache memory is in the form of static RAM which is much quicker than the dynamic system RAM. Secondly, the amount of cache memory is much smaller than that of system memory, with the result that the cache can be searched much more quickly than the system memory. Note that the speed that cache memory can be accessed is limited: L1 cache - being on the same die - can be accessed at the full CPU clock speed, while L2 cache was typically accessed at the memory bus speed. Of course, on the chips discussed so far, the core CPU clock speed and memory bus speed were the same. For later chips, this was soon no longer true...
Take a look at CPU Optimisation - Cache for more details.

In attempt to extract more speed out of the 486 CPU, clock-doubling technology arrived in the early 90s. This technology allows the CPU processor to run at a multiple of the bus memory speed. Thus a gain is achieved in CPU speed, without having to tackle the tricky problem of getting the memory bus speed (limitations of interfacing and of the motherboard itself) to match. While the CPU itself runs quicker, the wait time for off-die L2 cache and main memory access is not improved.
Two popular chips were the 486DX2-50 and the DX2-6, which both used clock-doubling technology. Let's look at the 486DX2-66: the core clockspeed was 66MHz but the system clock speed (and therefore memory bus speed) was still 33MHz. At a glance you would think that the DX2-66 was twice as fast as a 486DX 33MHz, when in fact the performance gain was slightly less than this due to the bottleneck in memory access. Intel also released the 486DX4, running at a clock speed of up to 100MHz. Despite the name, these chips actually used clock-tripling technology; a 100MHz 486DX4 actually used tripling with a system clock speed of 33.3MHz.
It should be noted that these clock-doubled chips ran very hot. Hence the use of a heat sink became necessary to dissipate the heat.

Rival 4th Generation CPUs
By this time, AMD (Advanced Micro Dynamics) and Cyrix had been making x86 clones for some years, but without much success. The AMD 5x86, released mid-90s, was not really any exception. Despite its name, it was not a 5th generation chip, nor did it use a 5x clock multiplier. This chip used clock-quadrupling technology to to run at 133MHz on a 33MHz 486 board. It was also known as a 486DX5 (presumably following on from Intel's DX4 being a clock-tripler). While this chip did offer quite a performance gain on Intel's 486 chips, Intel had already released its Pentium. While AMD claimed their 5x86 could rival an Intel Pentium 75 (hence it's somewhat misleading alternative name, the 5x86-P75), there really wasn't much room in the market for AMD at the time.
Now let's move on to look at the processor that has become one of the largest tradenames in history: the Pentium.
| Just Too Good Last updated: June, 2006 (DJL) |
Drop me a line