Comments on: Optimizing AI Inference Is As Vital As Building AI Training Beasts

By: EC

EC — Wed, 13 Sep 2023 14:48:05 +0000

In reply to Rakesh Cheerla. AI co-design is exactly what is happening today. If one listens closely to how Nvidia describes their development cycle, there is an interesting choreography between discovery, modeling, test, and implement in both hardware and software and with assistance of their in house supercomputers. They claim ML assists in both hardware and software design. After implementation new bottlenecks and opportunities are uncovered so the cycle starts all over again.

By: Rakesh Cheerla

Rakesh Cheerla — Tue, 12 Sep 2023 21:14:26 +0000

The history of computing teaches us that software always and necessarily lags hardware, and unfortunately that lag can stretch for many years when it comes to wringing the best performance out of iron by tweaking algorithms.

—-
What’s also true is that the software comes first, but at a lower performance.
The hardware “accelerates” a portion of that software, requiring some of the software to be rewritten for the hardware accelerator, which takes years, particularly, if the hardware implementations don’t take software constraints into account.
With better architecture-software-hardware co-design, we might reduce the time that software lags hardware. Perhaps AI algorithms will help make the trade-offs across software-hardware, humans are too biased in one direction or the other 🙂

By: Slim Albert

Slim Albert — Tue, 12 Sep 2023 17:40:11 +0000

In reply to Timothy Prickett Morgan. Great points (including about the Gaudi roadmap)!

By: HuMo

HuMo — Tue, 12 Sep 2023 17:28:34 +0000

These 300 PhDs do great work — a most worthy investment into the natural intelligence of brainy young grasshoppers! Adapting algorithms to provide the best match between workload and the new features and characteristics of recent hardware is a major job, with serious rewards, as exquisitely demontrated here.

For the most rotund of hefty LLMs, it seems that (on top of FP8 and in-flight de-batching) ruses also need particular attention on how to sit the massive weights over multiple computational supports, or “extensor parallelism”. q^8

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Tue, 12 Sep 2023 16:49:21 +0000

In reply to Matt. If there is availability, it is just because Intel made a certain amount and can now plunk some on its developer cloud and sell the rest. I doubt Intel can make more than whatever it had planned, but it gives Intel something to talk about when there is a bunch of tough news out there for the company.

By: Matt

Matt — Tue, 12 Sep 2023 13:42:57 +0000

In reply to Slim Albert.

Why would they be more available? The bottleneck is CoWoS. Gaudi2 still relies on CoWoS. And it probably costs significantly more in terms of manpower to get a Gaudi2 system up and running than an H100 system. Sure customers will buy whatever is available but only because it’s available, not because it’s cheaper to operate. The H100 is likely the lowest cost and most productive solution.
The Cerebras system is probably the one that can see the biggest gain from the CoWoS shortage because it doesn’t use CoWoS and seems to be in a state in terms of software and systems integration that it is useful.

By: Slim Albert

Slim Albert — Tue, 12 Sep 2023 04:18:49 +0000

In reply to EC. Sally Ward-Foxton (over at eetimes) has a nice bar chart that shows Gaudi2 gets 80% of H100 perf on GPT-J 6B (MLPerf V3.1). That should be satisfactory to system-builders if those chips are more available and less expensive than contenders (IMHO).

By: EC

EC — Mon, 11 Sep 2023 20:03:54 +0000

Thanks Tim, great write up as usual.
What is disappointing is the lack of competitive comparisons. Even in the today released MLPerf V3.1 benchmarks (https://mlcommons.org/en/inference-datacenter-31/). Though Intel Gaudi2 (7nm) looks interesting, I can’t see data centers putting a lot of investment in it (or 5nm Gaudi3) without a better roadmap.