Higher Level CPU Architecture

We've just seen that an instruction must be fetched from memory/cache, it must be decoded by the instruction decoder, and then it must be executed by the ALU. One should now ask how the ALU knows what to do. This will all be made clear now.


From Control Unit to ALU

The simplistic view (as shown in the Simplified View of CPU Architecture page) implies that the instruction decoder determines what instruction should be performed by the ALU and then sends on control signals to the ALU, as appropriate.

This is true in part, but is an over simplification. In fact, the control signals sent to the ALU (and which also control various other factors in the CPU, such as gating onto the buses) come from a complicated component called the control unit:

The Control Unit

The instruction decoder takes the current instruction fetched from memory and outputs a unique signal to the encoder. The decoder generates a unique signal for every possible instruction. The encoder is at the heart of the control unit. This takes inputs dependent on the decoder, CPU clock, register flags and various other inputs, and generates control signals accordingly. These control signals determine what actions then take place within the CPU.

The Role of the ALU

Recall that the ALU can not store data. Instead, it manipulates inputs to generate output. The arithmetic and logic unit handles, as its name suggests, logical operations (such as bitwise comparisons/evaluations) and simple integer arithmetic, such as addition, multiplication, division, and bit shifting. This latter term requires some brief explanation...

A binary value can be shifted left or right. When a value is shifted left, we add a 0 on to the right hand side. For example, take the value 01100101B. If we shift this one position to the left, we get 11001010B. Similarly, if we shift this second value right one position, we get back to where we started.

It turns out that bit shifting allows vary efficient multiplication and division, provided we are multiplying or dividing by 2 or a multiple of 2. For example, 01100101B has a decimal value of 101. (If you're not sure why, make sure you check out the section on binary.) When we shift this one position to the left, we get 11001010B, which has a decimal value of 202. This proves very efficient because bit shifting can be performed by the processor in very few clock cycles. Conversely, decimal multiplication requires many clock cycles and is therefore much slower. Note that there are situations where the length of the field is limited, e.g. to 8 bits. If you shift a 8 bit binary value to the left, where it's most significant bit was a 1, then this one simply gets lost off the end. Similarly, when you shift right, you always drop the right-most (least significant) bit. If this bit was a 0, then you divided exactly by a two or a multiple of two (if shifting more than one position). If the least-significant bit was a 1 however, then this is lost. The result of the division is to give a whole number, without the remainder. Hence shifting 01100101B one position to the right results in 0110010B (the right-most bit has been lost), which is 50 in decimal. Note that 50 does indeed go into 101 twice.


The Integer Unit vs FPU

The combination of control unit, ALU and associated registers makes up an integer unit. A given processor can in fact have more than one integer unit, allowing it to perform integer instructions in parallel. There will be more on this later when we talk about CPU Optimisation.

In addition, modern CPUs also have one or more floating point units (FPUs). Why? Well the ALU does a very poor job of manipulating floating point numbers, i.e. numbers that are not integers. Therefore manipulation of this type of data is carried out by the FPU. The FPU is much like an integer unit in terms of its architecture. For example, an FPU will have its own ALU and its own registers. However, the implementation is specific for the manipulation of floating point numbers. To reflect this, FPU registers are designed to hold much larger numbers and therefore tend to be larger than their integer counterparts.

Old CPUs like the 286 didn't have an FPU, but one could be added additionally if requied. From the 486 onwards, FPUs became a standard component of the CPU. The 486DX had a set of 80 bit floating point registers for storing and manipulating floating point numbers.


Floating Point Instructions

Of course, the manipulation of floating point numbers requires instructions dedicated to the task. All x86 CPUs have such instructions, but only those with FPUs can perform them.

Indeed, some entire subsets of instructions are FPU-specific, such as 3D-NOW! technology. Implemented originally in the AMD K6-2 chip and later improved, the first 3D-NOW! instruction set comprised 21 FPU-specific instructions, used exclusively to perform multimedia tasks quickly. The most common benefit from having 3D-NOW! instructions was during intensive graphics manipulation, such as in 3D games. It was these games-optimising features that paved the way for AMD to begin taking some of the market share away from Intel's monopoly. Indeed, as CPUs have evolved, the instruction set has expanded much more rapidly in the arena of floating point operations than in the relatively static integer-based instruction set. More on this later.

What's next

We'll now take a look at CPU Optimisation Principles.