Intel Brings A Big Fork To A Server CPU Knife Fight

Timothy Prickett Morgan

9 months ago

With Intel’s foundry still trying to get caught up with the process and packaging offered by archrival Taiwan Semiconductor Manufacturing Co, Intel’s server CPU product line has to “make do” with what the foundry has and create products that give the right mix of performance and price to compete with CPU rival AMD in the X86 space and the Arm collective that is creating a new CPU tier in the datacenter.

And so, Intel has decided to fork its product line into machines that use true Xeon cores – what are known as P-cores, short for performance cores – and gussied up Atom cores – what are known as E-cores, short for energy efficient cores. This is not so much a new bifurcation of the Intel Xeon product line as it is a hardening of the principles that Intel has had for more than a decade. (The “Knights Landing” Xeon Phi processor was a muscle car version of the Atom CPU, sporting the first implementation of the AVX-512 vector math unit pinned to a server core that would have barely been able to run your cell phone.)

We grew up in Appalachia, and we live back in the mountains again after living three decades in New York City, and we understand that under the right circumstances – or more accurately, the wrong ones – a fork can be just as dangerous as a knife. (Look at how well Aquaman does with a golden fork.) You have to sharpen a spoon on the stone wall for a long while, but you can make that dangerous, too. . . .

This time around, Intel is not creating a toy server CPU based on Atom-style cores and limiting main memory and I/O expansion and hoping companies will buy lots of them and cram them into racks like canned goods for the winter. Rather, Intel is ganging up much larger numbers of Atom cores inside a real server socket, with real memory and I/O capacities, and that plug into standard Xeon server platforms to offer excellent price/performance and thermals for high throughput workloads where a standard Xeon P-core with HyperThreading just do not do the trick.

In the long run – meaning over the next five years or so – the market will decide if having two radically different cores with nearly the same instruction set can compete against two more similar cores with different layouts and half the L3 cache per core. The latter is the strategy of AMD, which is making a more subtle distinction to distinguish between its standard Zen cores, like the Zen 4 cores used in the “Genoa” variants of the Epyc 9000 series, and the “Bergamo” high core count and “Siena” low thermal server CPUs that are based on the Zen 4c cores.

The thing to remember is that while AMD now has a 33 percent shipment share of X86 server CPUs these days, Lisa Su pointed out in her keynote address at Computex 2024 yesterday, Intel still has the other 67 percent – and that is with its foundry arm self-imposedly tied behind its back. But it is wiggling out of the ropes.

Intel will get its foundry sorted by 2025 or so and it has plenty of good architects that can deliver excellent CPU designs and maybe even a competitive GPU with its “Falcon Shores” effort. It is working on getting better yield on its packaging. Intel will compete, and life will get harder for AMD and the Arm-y. The two flavors of the Xeon 6 lineup – the initial “Sierra Forest” E-core chips that are starting to roll out at Computex and the initial “Granite Rapids” P-core chips that is coming out in the third quarter – are the first steps toward closing the CPU server gap for Intel. In a year and a half, this will be a real knife fight, and we expect that the market shares in the X86 space could be just about even. And it won’t be long before Arm has 20 percent share of overall server shipments and RISC-V starts to get a few adherents here and there.

This CPU fight in the datacenter is far from settled.

Two Targets, One-ish Architecture

Intel has been talking about this E-core and P-core strategy for a while, but it bears going into some of the central tenets before getting into the first batch of Sierra Forest chips that Intel is talking about. There will be others. Intel is not just doing a big bang launch of the entire product line all at once, and we have a suspicion that it is capacity constrained on the Intel 7 and Intel 3 processes that are used to create the Sierra Forest chips.

The chart above, which is an amalgam we made of two Intel charts, says that the P-core variant of the Xeon 6 is aimed at AI workloads, but it is also aimed at HPC simulation and modeling and indeed any kind of workload where a stronger core is a better option than a weaker one. AI is but one kind of compute intensive workload, and admittedly it might be the most interesting one for enterprises that are considering using pretrained generative AI models and retraining them with their own data to run AI workloads on premises in their CPU fleets.

With the E-core chips not having AVX-512 vector units or AMX matrix math units, they cannot really do much in the way of AI or HPC processing. They are really designed for application, print, file, and web serving, and in some cases the E-core variants might be good at other kinds of microservices applications where chunks of code are fairly modest. Video streaming, media transcoding, and other kinds of data streaming are ideal for the E-core machines, says Intel.

With both the E-core and P-core designs, the memory and I/O controllers as well as the UltraPath Interconnect (UPI) links for NUMA shared memory clustering of CPUs are separated out from the cores, which reside on one, two, or three banks of chiplets. The “Sapphire Rapids” Xeon SP v4 launched in January 2023 had everything on each chiplet and integrated four of them to make a socket. With the “Emerald Rapids” Xeon SP v5 launched in December 2023, Intel backed down to two chiplets with slightly more aggregate cores, but all of the controllers were still on the same chiplets as the cores. There were also single chiplet, monolithic implementations of the Sapphire Rapids and Emerald Rapids chips as well for low and middle core count devices.

The core complexes on the Sierra Forest Xeon 6 processors are etched with the 7 nanometer Intel 3 process and the I/O and memory chiplets are etched with a further refined 10 nanometer Intel 7 process similar to the one used for Sapphire Rapids and Emerald Rapids.

The Xeon 6 processors will come in two package families, dubbed the 6700 and the 6900, which will be further differentiated in the use of E-core and P-core tiles. There is no Xeon 6 that will mix E-core and P-core chiplets in the same package, but presumably if someone wanted such a beast, Intel would build it.

Here are the specs on the 6700 series and the 6900 series:

In essence, the 6700 series creates socket are “virtual” low core count (LCC), high core count (HCC), and extreme core count (XCC) chip stitched together with EMIB packaging. There doesn’t seem to be a middle core count (MCC) variant.

Here is what the Xeon 6 6700 series die packages look like:

And there is what the 6900 series die packages look like:

The rollout of the Xeon 6 family of server CPUs is going to be staggered, and this, according to Intel, is based on feedback from customers. The lower end Sierra Forest E-core chips come out first, followed by the Granite Rapids higher end P-core chips in the third quarter:

In the first quarter next year, the fatter Sierra Forrest chips with up to 288 cores come out, and so will lower bin Granite Rapids in the 6300, 6500, and 6700 series. There is also going to be an SoC variant of the Granite Rapids chip, most likely for edge use cases where beefy cores and vector and matrix math units are used for AI inference processing.

There has never been a beefy Atom machine from Intel before, so it is difficult to make comparisons to the current Xeon SP and the future Xeon 6 beefy cored machines. In its presentations, Intel compares the Sierra Forrest Xeon 6 6700 chips to the second generation Xeon SP processors, which most people know by their codename “Cascade Lake,” processors that were launched in April 2019. Based on Intel’s benchmarks and our own analysis, we concur that the instructions per clock of the Atom-based E-core is about the same for integer work as the Cascade Lake Xeon SPs. If you do the math, an E-core in Sierra Forest has about 65 percent of the performance of an Emerald Rapids P-core. It all matches.

We will be doing a deeper architectural dive on the Xeon 6 6700E family, but in the meantime, here is the fairly modest SKU stack, which only has seven variants:

In Q1 2025, Intel will double up the performance of the Sierra Forest chips with two compute tiles and two I/O and memory controller tiles to create the Xeon 6 6900E, which is known as a ZCC package and which will have up to 288 cores.

Obviously, if you pay for your software on a per-core basis, the E-core variants might be a tough sell. But if you write your own microservices software or pay by the socket, then software pricing is not an issue and an E-core Xeon 6 might be the answer in terms of lowering thermals and costs while getting acceptable throughput.

Here is our usual performance comparison and pricing chart, which provides a raw performance metric relative the four-core “Nehalem” Xeon E5500 from March 2009. These performance metrics take into account cores, clocks, and IPC across generations.

The “performance general purpose” high-bin parts of the Emerald Rapids Xeon SP v5 processors range from 8 to 64 cores and from 16 to 128 threads and they have a relative performance that ranges from 5.85 to 27.78 by our methodology. Prices range from $1,099 to $11,600 in 1,000-unit tray quantities from Intel. The Sierra Forest chips do not have HyperThreading, and range from 64 to 144 cores (which means you only have from 64 to 144 threads). Prices range from $2,749 to $11,350, but relative performance ranges from 22.89 to 47.20, and that means the bang for the buck is anywhere from 19 percent to 43 percent better. For a given wattage, the performance is twice as high, or for a given performance, the wattage is half as much. Speaking very generally, of course.

The comparison with Cascade Lake Xeon SP v2 server CPUs is fun. The top bin Cascade Lake from 2019 had 56 P-cores and 112 threads running at 2.6 GHz and delivered 21.69 units of oomph at a cost of more than $946 per unit of performance. The low-end Sierra Forest CPU from 2024 has 64 E-cores running at 2.4 GHz and a relative performance of 22.89, but the cost per unit of performance is just a smidgen over $120. That is a factor of 7.9X improvement in price/performance over the past five years. That top bin Cascade Lake part consumed 400 watts, compared to the low-bin Xeon 6 6710E processor in the Sierra Forest lineup.

The top bin Sierra Forest 6700E part does more than twice the work of the low-bin part does, and the cost of a unit performance is double as well, so the gap with the Cascade Lake top bin part is half as great. But even 3.95X is pretty good.

Up next, we will do a deeper architecture dive on Sierra Forest and what we can surmise about Granite Rapids.