|
Intel Pentium 4 and i850 chipset reviewed
Inside the NetBurst Micro-Architecture
Intel promotes its NetBurst Micro-architecture as paving the way for an advanced class of next generation computers. In many ways this is true,
although one has to say in its current form this statement can be challenged as the speed it is launched at does little for the superior
architecture of the Pentium 4. Two factors which then spring to mind are firstly, whether it is worth jumping on the bandwagon now, and secondly,
whether Intel's NetBurst Micro-architecture will indeed stand the test of time when higher frequency based processors will be demanded and
performance will be the name of the game. We need to clarify what the NetBurst Micro-architecture brings to us as NEW technology. As is the case in
our industry buzzwords surround all the new launches manufacturers make - NetBurst is one of them. The NetBurst Micro-architecture consists of eight
features:
Hyper-pipelined technology, Rapid Execution Engine, Execution Trace Cache, Advanced transfer cache, Advanced dynamic execution, Enhanced
floating point/multimedia, Streaming SIMD extensions 2 and a new 400MHz system bus.
These features are described in detail in our chapter to follow this, however at this stage we wish to discuss a topic close to all our hearts
"Performance". Do we truly know which factors determine true processor performance?
What Factors Determine True Processor Performance?
In a day and age when performance is demanded, and manufacturers like Intel and AMD provide us with faster clock frequency processors, it is
easy for us to be thrown into confusion when we suddenly discover why our latest "highest clocked frequency processor" is not as
quick as we anticipated. We have at one stage all been there. This is the outcome of a distinct lack of any explanation or standard which
measures performance based on a given criteria. Intel however has a definition, which measures performance based on the time it takes to
execute a given application. It states; True performance is a combination of both clock frequency (MHz) and IPC (instructions per cycle):
Performance= MHz x IPC. This highlights that performance can be improved by increasing frequency, IPC or both. Therefore frequency seems to be
a function of both process technology and micro-architecture, at a given clock frequency, the IPC is a function of processor micro-architecture
and the specific task being executed. In addition to the two methods described above, it is also possible to increase performance by reducing
the number of instructions it takes to execute the specific task being measured. Single Instruction Multiple Data (SIMD) is a technique first
introduced by Intel in 1996 using 64-bit integer on the Pentium Processor with MMX technology and subsequently 128bit SIMD single precision
floating point (SSE) on the Pentium III processor. Applications can be broadly divided into two categories: integer/basic office productivity
applications and floating point/multimedia applications. The IPC achievable by these applications varies greatly, which is affected by the
number of branches that an application code typically takes and the predictability of these branches. The more branches taken that are
difficult to predict, the higher the possibility of mis-predicting and performing non-productive work.
Integer and basic office productivity applications, such as word processing and spreadsheet processing, tend to have many branches in
the code that are difficult to predict, which reduces overall IPC potential, as a result these are prone to improvements in micro-architectural
means, such as deeper pipelines. In addition to this a significant increase in performance levels on this platform does little to increase the
users experience, as this these type of applications only need to keep pace with the users level of read and write response time and today's
higher end Pentium III and alike processors would suffice. Floating point and multimedia applications are much easier to deal with as
they have branches that are very predictable, and thus have a higher IPC potential. As a result, these types of applications scale very well
with frequency and are inclined to benefit from deeper pipelines. In addition, the processing power required by these applications tends to be
abundant, the more the performance, the better the users experience. Intel Pentium 4's NetBurst Micro-Architecture is lower on IPC, but
according to Intel the increase in frequency capability more than makes up to deliver overall higher performance capability to the end user,
this was achieved in the NetBurst Micro-Architecture by implementing a Hyper Pipelined Technology where the depth of the pipeline was
doubled from that of the P6.
Hyper Pipelined Technology
The most common problem faced by Intel in the past has been one of increasing the clock speed of their processors, history is littered with
examples; the Pentium Classic and Pentium MMX reached as far as 233MHz before it fizzled out, the Pentium Pro a P6 generation processor reached
its maturity at 200MHz, this was than replaced with a Pentium II by moving the L2 cache off-die and after a die shrink it managed to reach a
grand 450MHz, the method of die shrinking is another way of getting an increase in clock speeds but one which in the long term does not yield
enough to make it profitable to invest in a new manufacturing plant. Continuing the P6 theme we were all able to see the disaster Intel was
faced with when it attempted to increase the clock speed of the Pentium III over the 1GHz barrier, resulting in having to recall all its
1.13GHz processors, this is easily done as the market pressure applied to Intel was too great not to take a 'stab at it'. This gamble was
purely down to its 0.13-micron fabrication process which was and still is someway in the future (Q3-2001 anticipated), this brings us
to another way of increasing clock speeds to achieve its objective. As opposed to shrinking the die as mentioned earlier, one could
make the processor do less. Intel achieves this by what it calls Hyper Pipelined Technology. This in simple terms signifies increasing the
number of stages in the processors pipeline, the deeper the pipeline the more stages an instruction has to go through to reach the end of the
pipeline, thus you are achieving less per clock. The trade off is you can ramp up the processor to higher clock rates which if ramped up to the
correct speed, you will end up achieving more than the trade off made. The original Pentium (P5) only featured a 5 stage pipeline, which by
today's standard was small, this was increased with the introduction of the Pentium Pro, Pentium II and Pentium III which featured a 10 stage
pipeline doubling the P5 pipeline and thus restricting the clock speed to 1GHz. The Intel Pentium 4 doubles the length of pipeline to 20 stages
deep. This then brings us back to our scenario of branch predicting and the risk of mis-predicting. Intel has taken certain steps to reduce
mis-predicts, although branch prediction algorithms are highly accurate, they are not 100% accurate if the processor mis-predicts a branch,
you have to start all over again, and with 20 stages deep, this results in a longer recovery time which makes for a lower IPC. To minimise this
the NetBurst Micro-architecture has implemented and Advanced Dynamic Execution engine and an Execution Trace Cache (see Architectural
features of Intel Pentium 4 section).
Prev |
Next
|