[Pc_Support] Re: AMD's L1 cache is 4x larger than Intel's L1 cache -- WAS: HP dv2500nr

Bryan J. Smith b.j.smith at ieee.org
Sat Jul 22 11:20:01 EDT 2006


On Sat, 2006-07-22 at 11:03 -0400, Bryan J. Smith wrote:
> Now Core 2 Duo _is_ different.  Intel has finally _chucked_ 

Ooops, didn't finish that statement ...

Intel has finally _chucked_ the _inefficient_ Netburst architecture.
The reason why Netburst existed is because it was a "stop gap" design to
merely bridge from IA-32 P6/Pentium Pro (ignoring the original Pentium,
it had too many bugs) to IA-64 Itanium.  So it did a quick, 18 month
"refit" by extending the pipes and doing _no_ redesign.

Why?  7 years ago, Intel thought we would all be Itanium by now.
Unfortunately, that wasn't reality (don't get me started on IA-64 --
it's a "computer science" "paper ideal" that almost _every_ single one
of us "electrical engineers" said would _fail_, _utterly_, and it
did ;-).

So Intel _finally_ decided to _really_ rev the P6/Pentium Pro with a
full 36-48 month re-design circa 2002-2003.  That is now the Core
architecture.  It's very efficient, just like the P6 was -- totally the
opposite of the Netburst (P4).

Understand that this is Intel's _first_ IA-32[e] redesign since the
Pentium Pro of 1994.  So now, with the introduction of the Core design,
Intel has a 5-6 year lead on AMD -- in addition to their 12-18 month
fabrication lead.

AMD's last redesign was the Athlon (yes, 32-bit) in 1999.  The
A64/Opteron (including Turion64) is the _same_ architecture.  The
original Athlon was actually a 40-bit platform -- based on the 64-bit
Digital Alpha 21x64 -- and easily extended to 64-bit registers.  That's
why AMD could do it without a major redesign.

The _only_ major advantage that AMD has over Intel is in the 2+ way
space, especially 4+ way.  It had already built a non-shared, _truly_
switched platform interconnect in the Athlon based on the Digital EV6.
They merely moved it from a switch crossbar to a partial mesh.  In fact,
as I understand it, AMD's multicore is the EV6 Xbar internally, meaning
they could easily go to 13-core with_out_ any redesign (EV6 is 16-way
minus 3, 2 for the dual DDR channels plus 1 for HyperTransport).

The switched/mesh design _forces_ AMD to put an I/O MMU on the CPU.
This took a _years_ for AMD to mature, which it did in the original
Athlon MP (yes, 32-bit).  That's because of memory coherency for I/O --
with a "shared bus" Intel just relies on the chipset.  With AMD, it has
to maintain _full_ I/O coherency on _each_ processor.  Now that has
turned into a _major_ advantage -- as Opterons maintain memory mapped
I/O affinity to _each_ processor.

The result is that Opteron scales much, much better than Intel Xeon
"shared bus" design.  Even it's forthcoming, split bus is still not the
same.  Although it will allow Xeon to scale a little better in
combination with IBM's X3 architecture (long story).

But it's clear that with Intel's return to the _efficient_ Pentium Pro
base in the Core processor, and the _end_ of Netburst (P4), Intel is
back in the lead as far as ALU performance.  Combined with their
proliferation of "lossy math" SSE (whereas AMD does SSE with its full,
3-issue FPU for greater precision -- great for scientific/engineering
apps, but who cares about video/games?), Intel is in a pretty position.

-- 
Bryan J. Smith          Professional, technical annoyance
mailto:b.j.smith at ieee.org    http://thebs413.blogspot.com
---------------------------------------------------------
The world is in need of solutions.  Unfortunately, people
seem to be more interested in blindly aligning themselves
with one of only two viewponts -- an "us v. them" debate
that has nothing to do with finding an actual solution.





More information about the Pc_support mailing list