Comments on: How Did DeepSeek Train Its AI Model On A Lot Less – And Crippled – Hardware? https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Wed, 05 Feb 2025 04:15:23 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: Timothy Prickett Morgan https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-247199 Fri, 31 Jan 2025 20:38:11 +0000 https://www.nextplatform.com/?p=145225#comment-247199 In reply to Erik Klipping.

Or like comparing 3D graphics to actually driving a car. . . .

]]>
By: Gravitycreatedlife https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-247176 Fri, 31 Jan 2025 15:47:41 +0000 https://www.nextplatform.com/?p=145225#comment-247176 Back door to a back door, straight to the CCP.

]]>
By: Erik Klipping https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-247083 Thu, 30 Jan 2025 21:51:28 +0000 https://www.nextplatform.com/?p=145225#comment-247083 I asked it a simple question about a legal issue in my country, and it succeeded in generating a human-like answer but failed to answer the question correctly. I understand that the potential lies in efficiency, but this is like comparing real-time 3D graphics on a C64 with 3D on modern hardware: they both look like 3D graphics, but one of them can only be used in a highly specialized use case.

]]>
By: Scott ho https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246931 Tue, 28 Jan 2025 23:20:45 +0000 https://www.nextplatform.com/?p=145225#comment-246931 Commentators testing bias have shown that deep seek can generate information relating to subjects censored by China. (They tell the LLM to use substitute characters, by-passing internal censors.) This indicates that the LLM was trained outside the Chinese firewall. This opens the possibility of training outside China on higher spec hardware.

]]>
By: John W https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246921 Tue, 28 Jan 2025 18:22:56 +0000 https://www.nextplatform.com/?p=145225#comment-246921 Rather like having a family with your sister, training one model on the output of another is how the insanity starts.

]]>
By: Mehdi Zoghlami https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246919 Tue, 28 Jan 2025 17:17:19 +0000 https://www.nextplatform.com/?p=145225#comment-246919 As I said before, more sanctions on China will only lead to more innovation from Chinese engineers. And Trump is going to Make China Great Again.

]]>
By: Paul Berry https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246916 Tue, 28 Jan 2025 16:36:01 +0000 https://www.nextplatform.com/?p=145225#comment-246916 Many in and out of Nvidia are claiming that this is actually a validation of the technology; that refinement of the methods of doing AI will make it plausible for more than 3-4 big companies to offer technology based on AI. I don’t know to what degree it’s true, but I really feel it’s necessary. I’m honestly not that impressed by what the AI industry has offered to date, and doubt it will be all that useful to a lot of industries. I think we need a lot more improvement before AI can be widely useful, and I’d rather have dozens of places trying to improve the state of the art, rather than just a handful.

]]>
By: Fernanda https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246912 Tue, 28 Jan 2025 15:19:56 +0000 https://www.nextplatform.com/?p=145225#comment-246912 Another banger Tim. Nice deepdive on deepseek… ha

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246910 Tue, 28 Jan 2025 14:35:52 +0000 https://www.nextplatform.com/?p=145225#comment-246910 In reply to Sunil Verma.

Interesting scenario, Sunil. Thanks for that.

]]>
By: Sunil Verma https://www.nextplatform.com/2025/01/27/how-did-deepseek-train-its-ai-model-on-a-lot-less-and-crippled-hardware/#comment-246909 Tue, 28 Jan 2025 14:18:51 +0000 https://www.nextplatform.com/?p=145225#comment-246909 What impresses me about DeepSeek-V3 is that it only has 671B parameters and it only activates 37B parameters for each token. Instead of trying to have an equal load across all the experts in a Mixture-of-Experts model, as DeepSeek-V3 does, experts could be specialized to a particular domain of knowledge so that the parameters being activated for one query would not change rapidly. This would allow a chip like Sapphire Rapids Xeon Max to hold the 37B parameters being activated in HBM and the rest of the 671B parameters would be in DIMMs. This would be an ideal inference server for a small/medium size business. Queries would stay behind the company’s firewall. Unlike data center GPUs, this hardware could be used for general-purpose computing when it is not needed for AI. The HBM bandwidth of Sapphire Rapids Xeon Max is only 1.23 TBytes/sec so that needs to be fixed but the overall architecture with both HBM and DIMMs is very cost-effective. The reason it is cost-effective is that there are 18x more total parameters than activated parameters in DeepSeek-V3 so only a small fraction of the parameters need to be in costly HBM. Imagine a Xeon Diamond Rapids with 4.8 TBytes/sec of HBM3E bandwidth. That could generate about 4800B / 37B = 130 tokens/sec using DeepSeek-V3.

NVIDIA’s market cap fell by $589B on Monday. This loss in market cap is about 7x more than Intel’s current market cap ($87.5B). At NVIDIA’s new lower market cap ($2.9T), NVIDIA still has a 33x higher market cap than Intel.

Timothy Prickett Morgan wrote a good article about Xeon Max here:

nextplatform.com/2022/11/15/sapphire-rapids-xeon-sps-plus-hbm-offer-big-performance-boost

]]>