News Posts matching #Benchmarks

Return to Keyword Browsing

Industry's First-to-Market Supermicro NVIDIA HGX B200 Systems Demonstrate AI Performance Leadership

Super Micro Computer, Inc. (SMCI), a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, has announced first-to-market industry leading performance on several MLPerf Inference v5.0 benchmarks, using the 8-GPU. The 4U liquid-cooled and 10U air-cooled systems achieved the best performance in select benchmarks. Supermicro demonstrated more than 3 times the tokens per second (Token/s) generation for Llama2-70B and Llama3.1-405B benchmarks compared to H200 8-GPU systems. "Supermicro remains a leader in the AI industry, as evidenced by the first new benchmarks released by MLCommons in 2025," said Charles Liang, president and CEO of Supermicro. "Our building block architecture enables us to be first-to-market with a diverse range of systems optimized for various workloads. We continue to collaborate closely with NVIDIA to fine-tune our systems and secure a leadership position in AI workloads." Learn more about the new MLPerf v5.0 Inference benchmarks here.

Supermicro is the only system vendor publishing record MLPerf inference performance (on select benchmarks) for both the air-cooled and liquid-cooled NVIDIA HGX B200 8-GPU systems. Both air-cooled and liquid-cooled systems were operational before the MLCommons benchmark start date. Supermicro engineers optimized the systems and software to showcase the impressive performance. Within the operating margin, the Supermicro air-cooled B200 system exhibited the same level of performance as the liquid-cooled B200 system. Supermicro has been delivering these systems to customers while we conducted the benchmarks. MLCommons emphasizes that all results be reproducible, that the products are available and that the results can be audited by other MLCommons members. Supermicro engineers optimized the systems and software, as allowed by the MLCommons rules.

AMD Instinct GPUs are Ready to Take on Today's Most Demanding AI Models

Customers evaluating AI infrastructure today rely on a combination of industry-standard benchmarks and real-world model performance metrics—such as those from Llama 3.1 405B, DeepSeek-R1, and other leading open-source models—to guide their GPU purchase decisions. At AMD, we believe that delivering value across both dimensions is essential to driving broader AI adoption and real-world deployment at scale. That's why we take a holistic approach—optimizing performance for rigorous industry benchmarks like MLperf while also enabling Day 0 support and rapid tuning for the models most widely used in production by our customers.

This strategy helps ensure AMD Instinct GPUs deliver not only strong, standardized performance, but also high-throughput, scalable AI inferencing across the latest generative and language models used by customers. We will explore how AMD's continued investment in benchmarking, open model enablement, software and ecosystem tools helps unlock greater value for customers—from MLPerf Inference 5.0 results to Llama 3.1 405B and DeepSeek-R1 performance, ROCm software advances, and beyond.

NVIDIA GeForce RTX 5080 Mobile GPU Benched, Approximately 10% Slower Than RTX 5090 Mobile

NVIDIA and its laptop manufacturing partners managed to squeeze out higher end models at the start of the week (March 31); qualifying just in time as a Q1 2025 launch. As predicted by PC gaming hardware watchdogs, conditions on day one—for the general public—were far from perfect. Media and influencer outlets received pre-launch evaluation units—Monday's embargo lift did not open up floodgates to a massive number of published/uploaded reviews. Independent benchmarking of Team Green's flagship—GeForce RTX 5090 Mobile—produced somewhat underwhelming results. To summarize, several outlets—including Notebookcheck—observed NVIDIA's topmost laptop-oriented GPU trailing way behind its desktop equivalent in lab tests. Notebookcheck commented on these findings: "laptop gamers will want to keep their expectations in check as the mobile GeForce RTX 5090 can be 50 percent slower than the desktop counterpart as shown by our benchmarks. The enormous gap between the mobile RTX 5090 and desktop RTX 5090 and the somewhat disappointing leap over the outgoing mobile RTX 4080 can be mostly attributed to TGP."

The German online publication was more impressed with NVIDIA's sub-flagship model—two Ryzen 9 9955HX-powered Schenker XMG Neo 16 test units—sporting almost identical specifications—were pitched against each other, a resultant mini-review of benched figures was made available earlier today. Notebookcheck's Allen Ngo provided some context: "3DMark benchmarks...show that the (Schenker Neo's) GeForce RTX 5080 Mobile unit is roughly 10 to 15 percent slower than its pricier sibling. This deficit translates fairly well when running actual games like Baldur's Gate 3, Final Fantasy XV, Alan Wake 2, or Assassin's Creed Shadows. As usual, the deficit is widest when running at 4K resolutions on demanding games and smallest when running at lower resolutions where graphics become less GPU bound. A notable observation is that the performance gap between the mobile RTX 5080 and mobile RTX 5090 would remain the same, whether or not DLSS is enabled. When running Assassin's Creed Shadows with DLSS on, for example, the mobile RTX 5090 would maintain its 15 percent lead over the mobile RTX 5080. The relatively small performance drop between the two enthusiast GPUs means it may be worth configuring laptops with the RTX 5080 instead of the RTX 5090 to save on hundreds of dollars or for better performance-per-dollar." As demonstrated by Bestware.com's system configurator, the XMG NEO 16 (A25) SKU with a GeForce RTX 5090 Mobile GPU demands a €855 (~$928 USD) upcharge over an RTX 5080-based build.

XPG Breaks World Record Again, LANCER RGB DDR5 Memory Hits 12762 MTs Overclocking Milestone

XPG, a gaming brand of ADATA Technology, a global leader in memory modules and flash memory, is proud to announce its successful collaboration with GIGABYTE Technology. Leveraging the XPG LANCER RGB DDR5 memory and LN2 (Liquid Nitrogen) cooling technology, achieved an astonishing 12,762 MT/s overclocking speed, breaking the DDR5 memory overclocking world record. This remarkable achievement has been verified by the renowned international overclocking scoring platform, HWBOT, reinforcing XPG's leadership in extreme overclocking memory.

A Legendary Overclocking Record By Two Industry Titans
This record-breaking overclocking effort was spearheaded by GIGABYTE's renowned overclocker and engineer, HiCookie. The setup featured the Z890 AORUS TACHYON ICE motherboard, meticulously designed for overclocking, paired with the XPG LANCER RGB DDR5 memory and an Intel Core Ultra 9 285K processor. The XPG LANCER's optimized PCB design, IC tuning capabilities, and integrated circuit technology ensured stable operation even under the extreme LN2 environment, laying a crucial foundation for this world record. The Z890 AORUS TACHYON ICE motherboard, boasting an array of overclocking-centric features, including an OC button and an LN2_SW liquid nitrogen mode switch, further facilitated the CPU's ability to achieve its peak performance under extreme conditions.

AMD Ryzen 5 9600 Nearly Matches 9600X in Early Benchmarks

The AMD Ryzen 5 9600 launched recently as a slightly more affordable variant of the popular Ryzen 5 9600X. Despite launching over a month ago, the 9600 still appears rather difficult to track down in retail stores. However, a recent PassMark benchmark has provided some insights as to the performance of the non-X variant of AMD's six-core Zen 5 budget CPU. Unsurprisingly, the Ryzen 5 9600X and the Ryzen 5 9600 are neck-and-neck, with the 9600X scraping past its non-X counterpart by a mere 2.2% in the CPU benchmark.

According to the PassMark result, the Ryzen 5 9600 scored 29,369, compared to the Ryzen 5 9600X's 30,016, while single-core scores were 4581 for the 9600X and 4433 points for the 9600, representing a 3.2% disparity between the two CPUs. The result is not surprising, since the only real difference between the 9600 and the 9600X is 200 MHz boost clock. All other specifications, including TDP, core count, cache amount and base clock speed, are identical. Both CPUs are also unlocked for overclocking, and both feature AMD Precision Boost 2. While the Ryzen 5 9600 isn't available just yet, it will seemingly be a good option for those who want to stretch their budget to the absolute maximum, since recent reports indicate that it will be around $20 cheaper than the Ryzen 5 9600X, coming in at around the $250-260 mark.

AMD-built Radeon RX 9070 non-XT Tested Out by Chiphell Member

Around late January, out-of-date AMD marketing material teased the existence of a Radeon RX 9070 series reference card design. Almost a month later, PC hardware news outlets picked up on an official signal about Team Red's launch lineup consisting entirely of board partner-produced options. First-party enthusiasts were disappointed by the apparent total lack of "Made by AMD" (MBA) solutions, but some unusual specimens appeared online roughly two weeks post-RDNA 4's launch. Reports pointed to triple-fan Radeon RX 9070 XT and dual-fan RX 9070 MBA cards being exchanged for cash via Chinese black market channels. Photographed examples seemed to sport a somewhat muted black shroud design—not quite as exciting when compared to AMD's marketed/rendered brushed metal effect promo units.

Members of the Chiphell forum have spent months leaking many aspects of Team Red's foray into a new generation of graphics architecture—going back to the days of old nomenclature: Radeon RX 8800 XT. Yesterday, one participant revealed their fresh purchase of a Radeon RX 9070 non-XT MBA card. They sold their old GeForce RTX 4070 SUPER 12 GB graphics card, in favor of Navi 48 GPU-based OEM hardware. The post focused mainly on photo uploads and screenshots, but a brief description stated: "purchased at original price (TPU note: presumably 4499 RMB), room temperature is 16 degrees Celsius. Dual fans on the front. The back panel has an AMD logo, but it's a sticker." As theorized by VideoCardz, AMD likely produced a limited number of pre-release "public" MBA cards. The publication reckons that partner companies have received a smattering of samples for evaluation or software development purposes. The presence of an old school Radeon logo (pre-RDNA era) is a head scratcher, given the unit's supposed first-party origin.

GALAX RTX 5090D HOF XOC LE Card Overclocked to 3.27 GHz, Record Breaking Prototype Enabled w/ Second 12V-2×6 Connector

As reported last month, GALAX had distributed prototypes of its upcoming flagship "Hall of Fame" (HOF) card—based on NVIDIA's Chinese market exclusive GeForce RTX 5090D GPU—to prominent figures within the PC hardware overclocking community. Earlier examples sported single 12V-2×6 power connectors, although GALAX's exposed white PCB design showed extra space for an additional unit. Evaluators conducted experiments involving liquid nitrogen-based cooling methods. The most vocal of online critics questioned the overclocking capability of initial GeForce RTX 5090D HOF samples, due to limitations presented by a lone avenue of power delivery. A definitive answer has arrived in the form of the manufacturer's elite team-devised GeForce RTX 5090D HOF Extreme Overclock (XOC) Lab Limited Edition candidate; a newer variant that makes use of dual 12V-2×6 power connectors. Several overclocking experts have entered into a GALAX-hosted competition—Micka:)Shu, a Chinese participant, posted photos of their test rig setup (see below).

Micka's early access sample managed to achieve top placement GPU on UL Benchmarks' 3DMark Speed Way Hall of Fame, with a final score of 17169 points. A screenshotted GPU-Z session shows the card's core frequency reaching 3277 MHz. Around late January, ASUS China's general manager (Tony Yu) documented his overclocking of a ROG Astral RTX 5090 D GAMING OC specimen up to 3.4 GHz; under liquid nitrogen cooled conditions. GALAX has similarly outfitted its flagship model with selectively binned components and an "over-engineered" design. The company's "bog-standard" HOF model is no slouch, despite the limitation imposed by a single power connector. The GALAX OC Facebook account sent out some appreciation to another noted competitor (and collaborator): "thanks to Overclocked Gaming Systems—OGS Rauf for help with the overclock of GeForce RTX 5090D HOF, and all of (our) GALAX products." The OGS member set world records with said "normal" HOF card—achieving scores of 59,072 points in 3DMark's Fire Strike Extreme project, and 25,040 points in Unigine Superposition (8K-optimized).

AMD Ryzen 9 9950X3D Leaked PassMark Score Shows 14% Single Thread Improvement Over Predecessor

Last Friday, AMD confirmed finalized price points for its upcoming Ryzen 9 9950X3D ($699) and 9900X3D ($599) gaming processors—both launching on March 12. Media outlets are very likely finalizing their evaluations of review silicon; official embargoes are due for lifting tomorrow (March 11). By Team Red decree, a drip feed of pre-launch information was restricted to teasers, a loose March launch window, and an unveiling of basic specifications (at CES 2025). A trickle of mid-January to early March leaks have painted an incomplete picture of performance expectations for the 3D V-Cache-equipped 16 and 12-core parts. A fresh NDA-busting disclosure has arrived online, courtesy of an alleged Ryzen 9 9950X3D sample's set of benchmark scores.

A pre-release candidate posted single and multi-thread ratings of 4739 and 69,701 (respectively), upon completion of PassMark tests. Based on this information, a comparison chart was assembled—pitching the Ryzen 9 9950X3D against its direct predecessor (7950X3D), a Zen 5 relative (9950X), and competition from Intel (Core Ultra 9 285K). AMD's brand-new 16-core flagship managed to outpace the previous-gen Ryzen 9 7950X3D by ~14% in single thread stakes, and roughly 11% in multithreaded scenarios. Test system build details and settings were not mentioned with this leak—we expect to absorb a more complete picture tomorrow, upon publication of widespread reviews. The sampled Ryzen 9 9950X3D CPU surpassed its 9950X sibling by ~5% with its multi-thread result, both processors are just about equal in terms of single-core performance. The Intel Core Ultra 9 285K CPU posted the highest single-core result within the comparison—5078 points—exceeding the 9950X3D's tally by about 7%. The latter pulls ahead by ~3% in terms of recorded multi-thread performance. Keep an eye on TechPowerUp's review section; where W1zzard will be delivering his verdict(s) imminently.

AMD Ryzen 9 9950X3D Leaked 3DMark & Cinebench Results Indicate 9950X-esque Performance

The AMD Ryzen 9 9950X3D processor will head to retail next month—a March 12 launch day is rumored—but a handful of folks seem to have early samples in their possession. Reviewers and online influencers have been tasked with evaluating pre-launch silicon, albeit under strict conditions; i.e. no leaking. Inevitably, NDA-shredding material has seeped out—yesterday, we reported on an alleged sample's ASUS Silicon Prediction rating. Following that, a Bulgarian system integrator/hardware retailer decided to upload Cinebench R23 and PCMark Time Spy results to Facebook. Evidence of this latest leak was scrubbed at the source, but VideoCardz preserved crucial details.

The publication noticed distinguishable QR and serial codes in PCbuild.bg's social media post; so tracing activities could sniff out points of origin. As expected, the leaked benchmark data points were compared to Ryzen 9 9950X and 7950X3D scores. The Ryzen 9 9950X3D sample recorded a score of 17,324 points in 3DMark Time Spy, as well as 2279 points (single-core) and 42,423 points (multi-core) in Cinebench R23. Notebookcheck observed that the pre-launch candidate came: "out ahead of the Ryzen 9 7950X3D in both counts, even if the gaming win is less than significant. Comparing the images of the benchmark results to our in-house testing and benchmark database shows the 9950X3D beating the 7950X3D by nearly 17% in Cinebench multicore." When compared to its non-3D V-Cache equivalent, the Ryzen 9 9950X3D leverages a slight performance advantage. A blurry shot of PCbuild.bg's HWiNFO session shows the leaked processor's core clock speeds; going up to 5.7 GHz (turbo) on a single CCD (non-X3D). The X3D-equipped portion seems capable of going up to 5.54 GHz.

AMD & Nexa AI Reveal NexaQuant's Improvement of DeepSeek R1 Distill 4-bit Capabilities

Nexa AI, today, announced NexaQuants of two DeepSeek R1 Distills: The DeepSeek R1 Distill Qwen 1.5B and DeepSeek R1 Distill Llama 8B. Popular quantization methods like the llama.cpp based Q4 K M allow large language models to significantly reduce their memory footprint and typically offer low perplexity loss for dense models as a tradeoff. However, even low perplexity loss can result in a reasoning capability hit for (dense or MoE) models that use Chain of Thought traces. Nexa AI has stated that NexaQuants are able to recover this reasoning capability loss (compared to the full 16-bit precision) while keeping the 4-bit quantization and all the while retaining the performance advantage. Benchmarks provided by Nexa AI can be seen below.

We can see that the Q4 K M quantized DeepSeek R1 distills score slightly less (except for the AIME24 bench on Llama 3 8b distill, which scores significantly lower) in LLM benchmarks like GPQA and AIME24 compared to their full 16-bit counter parts. Moving to a Q6 or Q8 quantization would be one way to fix this problem - but would result in the model becoming slightly slower to run and requiring more memory. Nexa AI has stated that NexaQuants use a proprietary quantization method to recover the loss while keeping the quantization at 4-bits. This means users can theoretically get the best of both worlds: accuracy and speed.

NVIDIA GeForce RTX 5070 Ti Allegedly Scores 16.6% Improvement Over RTX 4070 Ti SUPER in Synthetic Benchmarks

Thanks to some early 3D Mark benchmarks obtained by VideoCardz, NVIDIA's upcoming GeForce RTX 5070 Ti GPU paints an interesting picture of performance gains over the predecessor. Testing conducted with AMD's Ryzen 7 9800X3D processor and 48 GB of DDR5-6000 memory has provided the first glimpse into the card's capabilities. The new GPU demonstrates a 16.6% performance improvement over its predecessor, the RTX 4070 Ti SUPER. However, benchmark data shows it is falling short of the more expensive RTX 5080 by 13.2%, raising questions about the price-to-performance ratio given the $250 price difference between the two cards. Priced at $749 MSRP, the RTX 5070 Ti could be even pricier in retail channels at launch, especially with limited availability. The card's positioning becomes particularly interesting compared to the RTX 5080's $999 price point, which commands a 33% premium for its additional performance capabilities.

As a reminder, the RTX 5070 Ti boasts 8,960 CUDA cores, 280 texture units, 70 RT cores for ray tracing, and 280 tensor cores for AI computations, all supported by 16 GB of GDDR7 memory running at 28 Gbps effective speed across a 256-bit bus interface, resulting in an 896 GB/s bandwidth. We have to wait for proper reviews for the final performance conclusion, as synthetic benchmarks tell only part of the story. Modern gaming demands consideration of advanced features such as ray tracing and upscaling technologies, which can significantly impact real-world performance. The true test will come from comprehensive gaming benchmarks tested over various cases. The gaming community won't have to wait long for detailed analysis, as official reviews will be reportedly released in just a few days. Additional evaluations of non-MSRP versions should follow on February 20, the card's launch date.

UL Solutions Adds Support for DLSS 4 and DLSS Multi Frame Generation to the 3DMark NVIDIA DLSS Feature Test

We're excited to announce that in today's update to 3DMark, we're adding support for DLSS 4 and DLSS Multi Frame generation to the NVIDIA DLSS feature test. The NVIDIA DLSS feature test and this update were developed in partnership with NVIDIA. The 3DMark NVIDIA DLSS feature test lets you compare performance and image quality brought by enabling DLSS processing. If you have a new GeForce RTX 50 Series GPU, you'll also be able to compare performance with and without the full capabilities of DLSS 4.

You can choose to run the NVIDIA DLSS feature test using DLSS 4, DLSS 3 or DLSS 2. DLSS 4 includes the new DLSS Multi Frame Generation feature, and you can choose between several image quality modes—Quality, Balanced, Performance, Ultra Performance and DLAA. These modes are designed for different resolutions, from Full HD up to 8K. DLSS Multi Frame Generation uses AI to boost frame rates with up to three additional frames generated per traditionally rendered frame. In the 3DMark NVIDIA DLSS feature test, you are able to choose between 2x, 3x and 4x Frame Generation settings if you have an NVIDIA GeForce RTX 50 series GPU.

UL Adds New DirectStorage Test to 3DMark

Today we're excited to launch the 3DMark DirectStorage feature test. This feature test is a free update for the 3DMark Storage Benchmark DLC. The 3DMark DirectStorage feature test helps gamers understand the potential performance benefits that Microsoft's DirectStorage technology could have for their PC's gaming performance.

DirectStorage is a Microsoft technology for Windows PCs with PCIe SSDs that reduces the overhead when loading game data. DirectStorage can be used to reduce game loading times when paired with other technologies such as GDeflate, where the GPU can be used to decompress certain game assets instead of the CPU. On systems running Windows 11, DirectStorage can bring further benefits with BypassIO, lowering a game's CPU overhead by reducing the CPU workload when transferring data.

SPEC Delivers Major SPECworkstation 4.0 Benchmark Update, Adds AI/ML Workloads

The Standard Performance Evaluation Corporation (SPEC), the trusted global leader in computing benchmarks, today announced the availability of the SPECworkstation 4.0 benchmark, a major update to SPEC's comprehensive tool designed to measure all key aspects of workstation performance. This significant upgrade from version 3.1 incorporates cutting-edge features to keep pace with the latest workstation hardware and the evolving demands of professional applications, including the increasing reliance on data analytics, AI and machine learning (ML).

The new SPECworkstation 4.0 benchmark provides a robust, real-world measure of CPU, graphics, accelerator, and disk performance, ensuring professionals have the data they need to make informed decisions about their hardware investments. The benchmark caters to the diverse needs of engineers, scientists, and developers who rely on workstation hardware for daily tasks. It includes real-world applications like Blender, Handbrake, LLVM and more, providing a comprehensive performance measure across seven different industry verticals, each focusing on specific use cases and subsystems critical to workstation users. SPECworkstation 4.0 benchmark marks a significant milestone for measuring workstation AI performance, providing an unbiased, real-world, application-driven tool for measuring how workstations handle AI/ML workloads.

ScaleFlux SFX 5016 Will Set New Benchmarks for Enterprise SSD Efficiency and AI Workload Performance

As the IT sector continues to seek answers for scaling data processing performance while simultaneously improving efficiency - in terms of performance and density per watt, per system, per rack, and per dollar of CapEx and OpEx - ScaleFlux is answering the call with innovative design choices in its SSD controllers. The SFX 5016 promises to set new standards both for performance and for power efficiency.

In addition to carrying forward the transparent compression feature that ScaleFlux first released in 2020 in upgraded in 2022 with the SFX 3016 computational storage drive controller, the new SFX 5016 SOC processor includes a number of design advances.

Apple MacBook Air M3 Teardown Reveals Two NAND Chips on Basic 256 GB Config

Apple introduced its new generation of MacBook Air subcompact laptops last week—their press material focused mostly on the "powerful M3 chip" and its more efficient Neural Engine. Storage options were not discussed deeply—you had to dive into the Air M3's configuration page or specification sheet to find out more. Media outlets have highlighted a pleasing upgrade for entry-level models, in the area of internal SSD transfer speeds. Apple has seemingly taken onboard feedback regarding the disappointing performance of its basic MacBook Air M2 model—its 256 GB storage solution houses a lone 3D NAND package. Max Tech's Vadim Yuryev was one of the first media personalities to discover the presence of two NAND flash chips within entry-level MacBook Air M3 systems—his channel's video teardown can be watched below.

The upgrade from a single chip to a twin configuration has granted higher read and write speeds—Yuryev shared Blackmagic SSD speed test results; screengrabs from his video coverage are attached to this article. M3 MacBook Air's 256 GB solution achieved write speeds of 2,108 MB/s, posting 33% faster performance when compared to an equivalent M2 MacBook Air configuration. The M3 model recorded read speeds of 2,880 MB/s—Wccftech was suitably impressed by this achievement: "making it a whopping 82 percent than its direct predecessor, making it quite an impressive result. The commendable part is that Apple does not require customers to upgrade to the 512 GB storage variants of the M3 MacBook Air to witness higher read and write speeds." Performance is still no match when lined up against "off-the-shelf" PCIe 3.0 x4 drives, and tech enthusiasts find the entry price point of $1099 laughable. Apple's lowest rung option nets a 13-inch model that packs non-upgradable 8 GB of RAM and 256 GB of storage. Early impressions have also put a spotlight on worrying thermal issues—Apple's fan-less cooling solution is reportedly struggling to temper their newly launched M3 mobile chipset.

AMD Ryzen 7 8840U "Hawk Point" APU Exceeds Expectations in 10 W TDP Gaming Test

AMD Ryzen 8040 "Hawk Point" mobile processors continue to roll out in all sorts of review sample guises—mostly within laptops/notebooks and handheld gaming PC segments. An example of the latter would be GPD's Hawk Point-refreshed Win Max 2 model—Cary Golomb, a tech reviewer and self-described evangelist of "PC Gaming Handhelds Since 2016" has acquired this device for benchmark comparison purposes. A Ryzen 7 8840U-powered GPD Win Max 2 model was pitched against similar devices that house older Team Red APU technologies. Golomb's collection included Valve's Steam Deck LCD model, and three "Phoenix" Ryzen 7840U-based GPD models. He did not have any top-of-the-line ASUS or Lenovo handhelds within reach, but the onboard Ryzen Z1 Extreme APU is a close relative of 7840U.

Golomb's social media post included a screenshot of a Batman: Arkham Knight "average frames per second" comparison chart—all devices were running on a low 10 W TDP setting. The overall verdict favors AMD's new Hawk Point part: "Steam Deck low TDP performance finally dethroned...GPD continues to make the best AMD devices. 8840U shouldn't be better, but everywhere I'm testing, it is consistently better across every TDP. TSP measuring similar." Hawk Point appears to be a slight upgrade over Phoenix—most of the generational improvements reside within a more capable XDNA NPU, so it is interesting to see that the 8840U outperforms its predecessor. They both sport AMD's Radeon 780M integrated graphics solution (RDNA 3), while the standard/first iteration Steam Deck makes do with an RDNA 2-era "Van Gogh" iGPU. Golomb found that the: "three other GPD 7840U devices behaved somewhat consistently."

MSI Claw Review Units Observed Trailing Behind ROG Ally in Benchmarks

Chinese review outlets have received MSI Claw sample units—the "Please, Xiao Fengfeng" Bilibili video channel has produced several comparison pieces detailing how the plucky Intel Meteor Lake-powered handheld stands up against its closest rival; ASUS ROG Ally. The latter utilizes an AMD Ryzen Z1 APU—in Extreme or Standard forms—many news outlets have pointed out that the Z1 Extreme processor is a slightly reworked Ryzen 7 7840U "Phoenix" processor. Intel and its handheld hardware partners have not dressed up Meteor Lake chips with alternative gaming monikers—simply put, the MSI Claw arrives with Core Ultra 7-155H or Ultra 5-135H processors onboard. The two rival systems both run on Window 11, and also share the same screen size, resolution, display technology (IPS) and 16 GB LPDDR5-6400 memory configuration. The almost eight months old ASUS handheld seems to outperform its near-launch competition.

Xiao Fengfeng's review (Ultra 7-155H versus Z1 Extreme) focuses on different power levels and how they affect handheld performance—the Claw and Ally have user selectable TDP modes. A VideoCardz analysis piece lays out key divergences: "Both companies offer easy TDP profile switches, allowing users to adjust performance based on the game's requirements or available battery life. The Claw's larger battery could theoretically offer more gaming time or higher TDP with the same battery life. The system can work at 40 W TDP level (but in reality it's between 35 and 40 watts)...In the Shadow of the Tomb Raider test, the Claw doesn't seem to outperform the ROG Ally. According to a Bilibili creator's test, the system falls short at four different power levels: 15 W, 20 W, 25 W, and max TDP (40 W for Claw and 30 W for Ally)."

AMD Develops ROCm-based Solution to Run Unmodified NVIDIA's CUDA Binaries on AMD Graphics

AMD has quietly funded an effort over the past two years to enable binary compatibility for NVIDIA CUDA applications on their ROCm stack. This allows CUDA software to run on AMD Radeon GPUs without adapting the source code. The project responsible is ZLUDA, which was initially developed to provide CUDA support on Intel graphics. The developer behind ZLUDA, Andrzej Janik, was contracted by AMD in 2022 to adapt his project for use on Radeon GPUs with HIP/ROCm. He spent two years bringing functional CUDA support to AMD's platform, allowing many real-world CUDA workloads to run without modification. AMD decided not to productize this effort for unknown reasons but did open-source it once funding ended per their agreement. Over at Phoronix, there were several benchmarks testing AMD's ZLUDA implementation over a wide variety of benchmarks.

Benchmarks found that proprietary CUDA renderers and software worked on Radeon GPUs out-of-the-box with the drop-in ZLUDA library replacements. CUDA-optimized Blender 4.0 rendering now runs faster on AMD Radeon GPUs than the native ROCm/HIP port, reducing render times by around 10-20%, depending on the scene. The implementation is surprisingly robust, considering it was a single-developer project. However, there are some limitations—OptiX and PTX assembly codes still need to be fully supported. Overall, though, testing showed very promising results. Over the generic OpenCL runtimes in Geekbench, CUDA-optimized binaries produce up to 75% better results. With the ZLUDA libraries handling API translation, unmodified CUDA binaries can now run directly on top of ROCm and Radeon GPUs. Strangely, the ZLUDA port targets AMD ROCm 5.7, not the newest 6.x versions. Only time will tell if AMD continues investing in this approach to simplify porting of CUDA software. However, the open-sourced project now enables anyone to contribute and help improve compatibility. For a complete review, check out Phoronix tests.

NVIDIA CG100 "Grace" Server Processor Benchmarked by Academics

The Barcelona Supercomputing Center (BSC) and the State University of New York (Stony Brook and Buffalo campuses) have pitted NVIDIA's relatively new CG100 "Grace" Superchip against several rival products in a "wide variety of HPC and AI benchmarks." Team Green marketing material has focused mainly on the overall GH200 "Grace Hopper" package—so it is interesting to see technical institutes concentrate on the company's "first true" server processor (ARM-based), rather than the ever popular GPU aspect. The Next Platform's article summarized the chip's internal makeup: "(NVIDIA's) Grace CPU has a relatively high core count and a relatively low thermal footprint, and it has banks of low-power DDR5 (LPDDR5) memory—the kind used in laptops but gussied up with error correction to be server class—of sufficient capacity to be useful for HPC systems, which typically have 256 GB or 512 GB per node these days and sometimes less."

Benchmark results were revealed at last week's HPC Asia 2024 conference (in Nagoya, Japan)—Barcelona Supercomputing Center (BSC) and the State University of New York also uploaded their findings to the ACM Digital Library (link #1 & #2). BSC's MareNostrum 5 system contains an experimental cluster portion—consisting of NVIDIA Grace-Grace and Grace-Hopper superchips. We have heard plenty about the latter (in press releases), but the former is a novel concept—as outlined by The Next Platform: "Put two Grace CPUs together into a Grace-Grace superchip, a tightly coupled package using NVLink chip-to-chip interconnects that provide memory coherence across the LPDDR5 memory banks and that consumes only around 500 watts, and it gets plenty interesting for the HPC crowd. That yields a total of 144 Arm Neoverse "Demeter" V2 cores with the Armv9 architecture, and 1 TB of physical memory with 1.1 TB/sec of peak theoretical bandwidth. For some reason, probably relating to yield on the LPDDR5 memory, only 960 GB of that memory capacity and only 1 TB/sec of that memory bandwidth is actually available."

Intel Core i9-14900T Geekbenched - Comparable to AMD Ryzen 9 7900

Intel's Core i9-14900T processor was "officially" released last month alongside an expanded population of "Raptor Lake Refresh" products—the T-class alternative to Team Blue's flagship desktop Core i9-14900 CPU is a less glamorous prospect, hence almost zero press coverage and tech reviews. Its apparent lack of visibility is not helped by non-existent availability at retail, despite inclusion in Team Blue's second wave of 14th Generation Core processors (Marketing Status = Launched). The Core i9-14900 (non-K) is readily obtainable around the globe, as a lower-power alternative to the ever greedy Core i9-14900K, but their T-class SKU sibling takes frugality to another level. TPU's resident CPU tester, W1zzard, implemented six distinct power limit settings during a i9-14900K supplemental experiment, with the lowest being 35 W—coincidentally, matching the i9-14900T's default base power.

His simulated findings were not encouraging, to say the least, but late last week BenchLeaks noticed that a lone test system had gauged the T-class part's efficiency-oriented processing prowess. Geekbench 6.2.2 results were generated by an ASRock Z790 PG-ITX/TB4 build (with 64 GB of 5586 MT/s DDR5 SDRAM)—scoring 3019 in the overall single-core category, and 16385 in multi-core stakes. The latter score indicates a 22% performance penalty when referenced against Tom Hardware's Geekbenched i9-14900K sample. The publication reckons that these figures place Intel's Core i9-14900T CPU in good company—notably AMD's Ryzen 9 7900 processor, one of the company's trio of 65 W "non-X" SKUs. Last March, W1zzard was suitably impressed by his review sample's "fantastic energy efficiency"—the Geekbench 6 official scoreboard awards it 2823 (single-core) and 16750 (multi-core) based on aggregated data from multiple submissions.

AMD Ryzen 7 8700G AI Performance Enhanced by Overclocked DDR5 Memory

We already know about AMD Ryzen 7 8700G APU's enjoyment of overclocked memory—early reviews demonstrated the graphical benefits granted by fiddling with "iGPU engine clock and the processor's memory frequency." While gamers can enjoy a boosted integrated graphics solution that is comparable in performance 1080p stakes to a discrete Radeon RX 6500 XT GPU, AI enthusiasts are eager to experiment with the "Hawk Point" pat's Radeon 780M IGP and Neural Processing Unit (NPU)—the first generation Ryzen XDNA inference engine can unleash up to 16 AI TOPs. One individual, chi11eddog, posted their findings through social media channels earlier today, coinciding with the official launch of Ryzen 8000G processors. The initial set of results concentrated on the Radeon 780M aspect; NPU-centric data may arrive at a later date.

They performed quick tests on AMD's freshly released Ryzen 7 8700G desktop processor, combined with an MSI B650 Gaming Plus WiFi motherboard and two sticks of 16 GB DDR5-4800 memory. The MSI exclusive "Memory Try It" feature was deployed further up in the tables—this assisted in achieving and gauging several "higher system RAM frequency" settings. Here is chi11eddog's succinct interpretation of benchmark results: "7600 MT/s is 15% faster than 4800 MT/s in UL Procyon AI Inference Benchmark and 4% faster in GIMP with Stable Diffusion." The processor's default memory state is capable of producing 210 Float32 TOPs, according to chi11eddog's inference chart. The 6000 MT/s setting produces a 7% improvement over baseline, while 7200 MT/s drives proceedings to 11%—the flagship APU's Radeon 780M iGPU appears to be quite dependent on bandwidth. Their GIMP w/ Stable Diffusion benchmarks also taxed the integrated RDNA 3 graphics solution—again, it was deemed to be fairly bandwidth hungry.

AMD Ryzen 7 8700G & Ryzen 5 8600G APUs Geekbenched

AMD announced its Ryzen 8000G series of Zen 4-based desktop APUs earlier this month, with an official product launch date: January 31. The top models within this range are the "Hawk Point" Ryzen 7 8700G and Ryzen 5 8600G processors—Olrak29_ took to social media after spotting pre-release examples popping up on the Geekbench Browser database. It is highly likely that evaluation samples are in the hands of reviewers, and more benchmarked results are expected to be uploaded over the next week and a half. The Ryzen 7 8700G (w/ Radeon 780M Graphics) was benched on an ASUS ROG STRIX B650-A GAMING WIFI board with 32 GB (6398 MT/s) of DDR5 system memory. Leaked figures appeared online last weekend, originating from an Ryzen 5 8600G (w/ Radeon 760M Graphics) paired with an MSI B650 GAMING PLUS WIFI (MS-7E26) motherboard and 32 GB (6400 MT/s) of DDR5 RAM.

The Geekbench 6 results reveal that the Ryzen 7 8700G and Ryzen 5 8600G APUs are slightly less performant than "Raphael" Ryzen 7000 non-X processors—not a massive revelation, given the underlying technological similarities between these AMD product lines. Evaluations could change with the publication of official review data, but the 8000G series is at a natural disadvantage here—lower core clock frequencies and smaller L3 cache designations are the likely culprits. The incoming APUs are also somewhat hobbled with PCIe support only reaching 4.0 standards. VideoCardz, Tom's Hardware and Wccftech have taken the time to compile the leaked Geekbench 6 results into handy comparison charts—very much worth checking out.

AMD Ryzen Threadripper Pro 7995WX & 7975WX Specs Leaked

A pair of Dell Precision workstations have been tested in SiSoftware's Sandra benchmark suite—database entries for the 7875 Tower (Dell 00RP38) and 7875 Tower (Dell 00RP38) reveal specifications of next generation AMD Ryzen Threadripper Pro CPUs. The 32 core 7975WX model was outed a couple of weeks ago, but the Sandra benchmark database has been updated with additional scores. Its newly leaked sibling is getting a lot of attention—the recently benchmarked 7995WX sample appears to possess 96 Zen 4 cores, and 192 threads (via SMT) with a 5.14 GHz maximum single-core boost clock. Tom's Hardware is intrigued by benchmark data showing that the CPU has: "a 3.2 GHz all-core turbo frequency."

There are 12 CCDs onboard, with a combined total of 384 MB of L3 cache (each CCD has access to 32 MB of L3)—therefore Wccftech believes that "this chip is based on the Genoa SP5 die and will adopt the top 8-channel and SP5 socket platform. The chip also features 96 MB of L2 cache and the top clock speed was reported at 5.14 GHz." The repeat benched Ryzen Threadripper Pro 7975WX CPU is slightly less exciting—with 32 Zen 4 cores, 64 threads, 128 MB of L3 cache, and 32 MB of L2 cache. According to older information, this model is believed to have a TDP rating of 350 W and apparent clock speeds peaking at 4.0 GHz—Wccftech reckons that this frequency reflects an all-core boost. They have produced a bunch of comparative performance charts and further analysis—well worth checking out.

NVIDIA GH200 Superchip Aces MLPerf Inference Benchmarks

In its debut on the MLPerf industry benchmarks, the NVIDIA GH200 Grace Hopper Superchip ran all data center inference tests, extending the leading performance of NVIDIA H100 Tensor Core GPUs. The overall results showed the exceptional performance and versatility of the NVIDIA AI platform from the cloud to the network's edge. Separately, NVIDIA announced inference software that will give users leaps in performance, energy efficiency and total cost of ownership.

GH200 Superchips Shine in MLPerf
The GH200 links a Hopper GPU with a Grace CPU in one superchip. The combination provides more memory, bandwidth and the ability to automatically shift power between the CPU and GPU to optimize performance. Separately, NVIDIA HGX H100 systems that pack eight H100 GPUs delivered the highest throughput on every MLPerf Inference test in this round. Grace Hopper Superchips and H100 GPUs led across all MLPerf's data center tests, including inference for computer vision, speech recognition and medical imaging, in addition to the more demanding use cases of recommendation systems and the large language models (LLMs) used in generative AI.
Return to Keyword Browsing
Apr 8th, 2025 08:07 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts