News Posts matching #DeepSeek

Return to Keyword Browsing

Moore Threads Teases Excellent Performance of DeepSeek-R1 Model on MTT GPUs

Moore Threads, a Chinese manufacturer of proprietary GPU designs is (reportedly) the latest company to jump onto the DeepSeek-R1 bandwagon. Since late January, NVIDIA, Microsoft and AMD have swooped in with their own interpretations/deployments. By global standards, Moore Threads GPUs trail behind Western-developed offerings—early 2024 evaluations presented the firm's MTT S80 dedicated desktop graphics card struggling against an AMD integrated solution: Radeon 760M. The recent emergence of DeepSeek's open source models has signalled a shift away from reliance on extremely powerful and expensive AI-crunching hardware (often accessed via the cloud)—widespread excitement has been generated by DeepSeek solutions being relatively frugal, in terms of processing requirements. Tom's Hardware has observed cases of open source AI models running (locally) on: "inexpensive hardware, like the Raspberry Pi."

According to recent Chinese press coverage, Moore Threads has announced a successful deployment of DeepSeek's R1-Distill-Qwen-7B distilled model on the aforementioned MTT S80 GPU. The company also revealed that it had taken similar steps with its MTT S4000 datacenter-oriented graphics hardware. On the subject of adaptation, a Moore Threads spokesperson stated: "based on the Ollama open source framework, Moore Threads completed the deployment of the DeepSeek-R1-Distill-Qwen-7B distillation model and demonstrated excellent performance in a variety of Chinese tasks, verifying the versatility and CUDA compatibility of Moore Threads' self-developed full-featured GPU." Exact performance figures, benchmark results and technical details were not disclosed to the Chinese public, so Moore Threads appears to be teasing the prowess of its MTT GPU designs. ITHome reported that: "users can also perform inference deployment of the DeepSeek-R1 distillation model based on MTT S80 and MTT S4000. Some users have previously completed the practice manually on MTT S80." Moore Threads believes that its: "self-developed high-performance inference engine, combined with software and hardware co-optimization technology, significantly improves the model's computing efficiency and resource utilization through customized operator acceleration and memory management. This engine not only supports the efficient operation of the DeepSeek distillation model, but also provides technical support for the deployment of more large-scale models in the future."

Huawei Delivers Record $118 Billion Revenue with 22% Yearly Growth Despite US Sanctions

Huawei Technologies reported a robust 22% year-over-year revenue increase for 2024, reaching 860 billion yuan ($118.27 billion), demonstrating remarkable resilience amid continued US-imposed trade restrictions. The Chinese tech giant's resurgence was primarily driven by its revitalized smartphone division, which captured 16% of China's domestic market share, overtaking Apple in regional sales. This achievement was notably accomplished by deploying domestically produced chipsets, marking a significant milestone for the company. In collaboration with Chinese SMIC, Huawei delivers in-house silicon solutions to integrate with HarmonyOS for complete vertical integration. The company's strategic diversification into automotive technology has emerged as a crucial growth vector, with its smart car solutions unit delivering autonomous driving software and specialized chips to Chinese EV manufacturers.

In parallel, Huawei's Ascend AI 910B/C platform recently announced compatibility with DeepSeek's R1 large language model and announced availability on Chinese AI cloud providers like SiliconFlow. Through a strategic partnership with AI infrastructure startup SiliconFlow, Huawei is enhancing its Ascend cloud service capabilities, further strengthening its competitive position in the global AI hardware market despite ongoing international trade challenges. Even if the company can't compete on performance versus the latest solutions from NVIDIA and AMD due to the lack of advanced manufacturing required for AI accelerators, it can compete on costs and deliver solutions that are typically much more competitive with the price/performance ratio. Huawei's Ascend AI solutions deliver modest performance. Still, the pricing makes AI model inference very cheap, with API costs of around one Yaun per million input tokens and four Yuan per one million output tokens on DeepSeek R1.

AMD Faces Investor Skepticism as AI Market Moves Toward Custom Chips

AMD is set to share its fourth-quarter results on Tuesday, Feb. 4 facing opportunities and problems in the fast-changing AI chip market as investors are expected to look closely at AMD's AI strategy. Reuters reports that experts think AMD's revenue will increase by over 22% to $7.53 billion. They expect its data center part to make up more than half of total sales at $4.15 billion. Yet, investors still worry about how AMD stands in the AI race. TD Cowen experts and Omdia believe AMD could sell $10 billion worth of AI chips this year, this is twice what AMD itself thinks it will sell, which is $5 billion. However, the scene is getting more complex with Big Tech firms like Microsoft, Amazon, and Meta making their own special chips for AI work. This move to custom chips, along with NVIDIA's strong market position and its popular CUDA software, makes things tough for AMD. The high costs of switching chipmakers also make it hard for AMD to grow its share of the market, however, the ongoing increase in AI spending by tech giants could help balance out these problems. Investors see "customer silicon and NVIDIA as the AI chip market going forward," said Ryuta Makino, analyst at AMD investor Gabelli Funds.

Supply chain issues make AMD's position more difficult as TSMC is boosting its advanced packaging ability to fix bottlenecks, while NVIDIA's production increase of its new "Blackwell" AI chips might restrict AMD's access to manufacturing resources. Yet, AMD's business has some good news, its personal computer unit should grow by almost 33% to $1.94 billion catching up to Intel.

NVIDIA GeForce RTX 50 Series AI PCs Accelerate DeepSeek Reasoning Models

The recently released DeepSeek-R1 model family has brought a new wave of excitement to the AI community, allowing enthusiasts and developers to run state-of-the-art reasoning models with problem-solving, math and code capabilities, all from the privacy of local PCs. With up to 3,352 trillion operations per second of AI horsepower, NVIDIA GeForce RTX 50 Series GPUs can run the DeepSeek family of distilled models faster than anything on the PC market.

A New Class of Models That Reason
Reasoning models are a new class of large language models (LLMs) that spend more time on "thinking" and "reflecting" to work through complex problems, while describing the steps required to solve a task. The fundamental principle is that any problem can be solved with deep thought, reasoning and time, just like how humans tackle problems. By spending more time—and thus compute—on a problem, the LLM can yield better results. This phenomenon is known as test-time scaling, where a model dynamically allocates compute resources during inference to reason through problems. Reasoning models can enhance user experiences on PCs by deeply understanding a user's needs, taking actions on their behalf and allowing them to provide feedback on the model's thought process—unlocking agentic workflows for solving complex, multi-step tasks such as analyzing market research, performing complicated math problems, debugging code and more.

Huawei Ascend 910B Accelerators Power Cloud Infrastructure for DeepSeek R1 Inference

When High-Flyer, the hedge fund behind DeepSeek, debuted its flagship model, DeepSeek R1, the tech world went downward. No one expected Chinese AI companies can produce high-quality AI model that rivals the best from OpenAI and Anthropic. While there are rumors that DeepSeek has access to 50,000 NVIDIA "Hopper" GPUs, including H100, H800, and H20, it seems like Huawei is ready to power Chinese AI infrastructure with its AI accelerators. According to the South China Morning Post, Chinese cloud providers like SiliconFlow.cn are offering DeepSeek AI models for inference on Huawei Ascend 910B accelerators. For the price of only one Yuan for one million input tokens, and four Yuan for one million output tokens, this economic model of AI hosting is fundamentally undercutting competition like US-based cloud providers that offer DeepSeek R1 for $7 per million tokens.

Not only is running on the Huawei Ascend 910B cheaper for cloud providers, but we also reported that it is cheaper for DeepSeek itself, which serves its chat app on the Huawei Ascend 910C. Using domestic accelerators lowers the total cost of ownership, with savings passed down to users. If Western clients prefer AI inference to be served by Western companies, they will have to pay a heftier price tag, often backed by the high prices of GPUs like NVIDIA H100, B100, and AMD Instinct MI300X.

US Investigates Possible "Singapore" Loophole in China's Access to NVIDIA GPUs

Today, Bloomberg reported that the US government under Trump administration is probing whether Chinese AI company DeepSeek circumvented export restrictions to acquire advanced NVIDIA GPUs through Singaporean intermediaries. The investigation follows concerns that DeepSeek's AI model, R1—reportedly rivaling leading systems from OpenAI and Google—may have been trained using restricted hardware that is blocked from exporting to China. Singapore's role in NVIDIA's global sales has surged, with the nation accounting for 22% of the chipmaker's revenue in Q3 FY2025, up from 9% in Q3 FY2023. This spike coincides with tightened US export controls on AI chips to China, prompting speculation that Singapore serves as a pipe for Chinese firms to access high-end GPUs like the H100, which cannot be sold directly to China.

DeepSeek has not disclosed hardware details for R1 but revealed its earlier V3 model was trained using 2,048 H800 GPUs (2.8 million GPU hours), achieving efficiency surpassing Meta's Llama 3, which required 30.8 million GPU hours. Analysts suggest R1's performance implies even more powerful infrastructure, potentially involving restricted chips. US authorities, including the White House and FBI, are examining whether third parties in Singapore facilitated the transfer of controlled GPUs to DeepSeek. A well-known semiconductor analyst firm, SemiAnalysis, believes that DeepSeek acquired around 50,000 NVIDIA Hopper GPUs, which includes a mix of H100, H800, and H20. NVIDIA clarified that its reported Singapore revenue reflects "bill to" customer locations, not final destinations, stating most products are routed to the US or Western markets.

DeepSeek-R1 Goes Live on NVIDIA NIM

DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer. Performing this sequence of inference passes—using reason to arrive at the best answer—is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.

As models are allowed to iteratively "think" through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments. R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.

Microsoft Enables Distilled DeepSeek R1 Models on Copilot+ PCs, Starting with Qualcomm Snapdragon X

Microsoft will roll out "NPU-optimized versions of DeepSeek-R1" directly to Copilot+ PCs—yesterday's announcement revealed that Qualcomm Snapdragon X-equipped systems will be first in line to receive support. Owners of devices—that harbor Intel Core Ultra 200V "Lunar Lake" processors—will have to wait a little longer, and reports suggest that AMD Ryzen AI 9 HX-based Copilot+ PCs will be third in Microsoft's queue. Interestingly, Team Red has published DeepSeek R1 model-related guides for Radeon RX graphics cards and Ryzen AI processors. Starting off, Microsoft's first release will be based on DeepSeek-R1-Distill-Qwen-1.5B—made available in AI Toolkit. Teased future updates will be 7B and 14B variants. These are expected to "arrive soon."

Microsoft reckons that the optimized models will: "let developers build and deploy AI-powered applications that run efficiently on-device, taking full advantage of the powerful Neural Processing Unit (NPUs) in Copilot+ PCs. The on-board AI-crunching solution is advertised as a tool for empowerment—allowing: "developers to tap into powerful reasoning engines to build proactive and sustained experiences. With our work on Phi Silica, we were able to harness highly efficient inferencing—delivering very competitive time to first token and throughput rates, while minimally impacting battery life and consumption of PC resources." Western companies appear to be participating in a race to swiftly adopt DeepSeek's open source model, due to apparent cost benefits. Certain North American organizations have disclosed their own views and reservations, but others will happily pay less for a potent alternative to locally-developed systems. In a separate bulletin (also posted on January 29), Microsoft's AI platform team revealed that a cloud-hosted DeepSeek R1 model is available on Azure AI Foundry and GitHub.

AMD Details DeepSeek R1 Performance on Radeon RX 7900 XTX, Confirms Ryzen AI Max Memory Sizes

AMD today put out detailed guides on how to get DeepSeek R1 distilled reasoning models to run on Radeon RX graphics cards and Ryzen AI processors. The guide confirms that the new Ryzen AI Max "Strix Halo" processors come in hardwired to LPCAMM2 memory configurations of 32 GB, 64 GB, and 128 GB, and there won't be a 16 GB memory option for notebook manufacturers to cheap out with. The guide goes on to explain that "Strix Halo" will be able to locally accelerate DeepSeek-R1-Distill-Llama with 70 billion parameters on the 64 GB and 128 GB memory configurations of "Strix Halo" powered notebooks, while the 32 GB model should be able to run DeepSeek-R1-Distill-Qwen-32B. Ryzen AI "Strix Point" mobile processors should be capable of running DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Llama-14B on their RDNA 3.5 iGPUs and NPUs. Meanwhile, older generation processors based on "Phoenix Point" and "Hawk Point" chips should be capable of DeepSeek-R1-Distill-Llama-14B. The company recommends running all of the above distills in Q4 K M quantization.

Switching gears to the discrete graphics cards, and AMD is only recommending its Radeon RX 7000 series for now, since the RDNA 3 graphics architecture introduces AI accelerators. The flagship Radeon RX 7900 XTX is recommended for DeepSeek-R1-Distill-Qwen-32B distill, while all SKUs with 12 GB to 20 GB of memory—that's RX 7600 XT, RX 7700 XT, RX 7800 XT, RX 7900 GRE, and RX 7900 XT, are recommended till DeepSeek-R1-Distill-Qwen-14B. The mainstream RX 7600 with its 8 GB memory is only recommended till DeepSeek-R1-Distill-Llama-8B. You will need LM Studio 0.3.8 or later and Radeon Software Adrenalin 25.1.1 beta or later drivers. AMD put out first party LMStudio 0.3.8 tokens/second performance numbers for the RX 7900 XTX, comparing it with the NVIDIA GeForce RTX 4080 SUPER and the RTX 4090.
Return to Keyword Browsing
Feb 7th, 2025 00:56 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts