News Posts matching #MUSA

Return to Keyword Browsing

Moore Threads MTLink Scales Up to 10,000 Home-Grown GPUs in AI Cluster

Chinese GPU manufacturer Moore Threads has announced a significant upgrade to its KUAE data center server. The company now has the ability to connect up to 10,000 GPUs in a single cluster, marking a huge leap in its scale-out capabilities for artificial intelligence and high-performance computing applications. The enhanced KUAE server incorporates eight MTT S4000 GPUs, leveraging Moore Threads' proprietary MTLink interconnect technology. These GPUs, based on the MUSA architecture, each feature 128 tensor cores and 48 GB of GDDR6 memory, delivering a bandwidth of 768 GB/s. While the full performance metrics of a 10,000-GPU cluster remain undisclosed, the sheer scale of 1,280,000 tensor cores suggests decent computing potential. Moore Threads' GPUs currently lag behind NVIDIA's GPU offerings in terms of performance. However, the company claims its MTT S4000 remains competitive against certain NVIDIA models, particularly in large language model training and inference tasks.

The Chinese company is facing significant challenges due to its inclusion on the U.S. Department of Commerce's Entity List, restricting access to advanced manufacturing processes. Despite these obstacles, the firm has secured partnerships with major Chinese state-run telecom operators and technology companies, focusing on developing new computing cluster projects. A recent financing round raised approximately $343.7 million will help Moore Threads' ambitious expansion plans. However, limited access to cutting-edge semiconductor fabrication technologies may constrain the company's future growth. Nonetheless, creating a scale-out server infrastructure with up to 10,000 GPUs is vital for LLM training and inference, especially as Chinese AI labs catch up to Western labs in terms of the performance of their AI models.

Moore Threads MTT S80 dGPU Struggles to Keep Up with Modern Radeon iGPUs

The Moore Threads MTT S80 first attracted wider media attention last summer due to it being introduced as the world's first PCIe Gen 5 gaming graphics card. Unfortunately, its performance prowess in gaming benchmarks did not match early expectations, especially for a 200 W TDP-rated unit with 4096 "MUSA" cores. Evaluators discovered that driver issues have limited the full potential of MTT GPUs—it is speculated that Moore Threads has simply repurposed existing PowerVR architecture under their in-house design: "Chunxaio." The Chinese firm has concentrated on driver improvements in the interim—mid-February experimentations indicated 100% performance boosts for MTT S80 and S70 discrete GPUs courtesy of driver version 240.90. Germany's ComputerBase managed to import Moore Threads MTT S80 and S30 models for testing purposes—in an effort to corroborate recently published performance figures, as disclosed by Asian review outlets.

The Moore Thread MTT S80—discounted down to $164 last October—was likely designed with MMO gamers in mind. VideoCardz (based on ComputerBase findings) discussed the card's struggles when weighed against Team Red's modern day integrated solutions: "S80 falls short when compared to the Ryzen 5 8600G, featuring the Radeon 760M iGPU with RDNA 3 graphics. A geometric mean across various titles reveals the S80's lag, but there are exceptions, like DOTA 2, where it takes the lead in framerate. It's clear that MTT GPUs (have a) less emphasized focus on supporting AAA titles." ComputerBase confirmed that DirectX 12 API support is still lacking, meaning that many popular Western games titles remain untested on the Moore Threads MTT S80 graphics card. The freshly launched entry-level MTT S30 card produced "1/4 of the performance" when compared to its flagship sibling.

Moore Threads Launches MTT S4000 48 GB GPU for AI Training/Inference and Presents 1000-GPU Cluster

Chinese chipmaker Moore Threads has launched its first domestically-produced 1000-card AI training cluster, dubbed the KUAE Intelligent Computing Center. A central part of the KUAE cluster is Moore Threads new MTT S4000 accelerator card with 48 GB VRAM utilizing the company's third-generation MUSA GPU architecture and 768 GB/s memory bandwidth. In FP32, the card can output 25 TeraFLOPS; in TF32, it can achieve 50 TeraFLOPS; and in FP16/BF16, up to 200 TeraFLOPS. Also supported is INT8 at 200 TOPS. The MTT S4000 focuses on both training and inference, leveraging Moore Thread's high-speed MTLink 1.0 intra-system interconnect to scale cards for distributed model parallel training of datasets with hundreds of billions of parameters. The card also provides graphics, video encoding/decoding, and 8K display capabilities for graphics workloads. Moore Thread's KUAE cluster combines the S4000 GPU hardware with RDMA networking, distributed storage, and integrated cluster management software. The KUAE Platform oversees multi-datacenter resource allocation and monitoring. KUAE ModelStudio hosts training frameworks and model repositories to streamline development.

With integrated solutions now proven at thousands of GPUs, Moore Thread is positioned to power ubiquitous intelligent applications - from scientific computing to the metaverse. The KUAE cluster reportedly achieves near-linear 91% scaling. Taking 200 billion training data as an example, Zhiyuan Research Institute's 70 billion parameter Aquila2 can complete training in 33 days; a model with 130 billion parameters can complete training in 56 days on the KUAE cluster. In addition, the Moore Threads KUAE killocard cluster supports long-term continuous and stable operation, supports breakpoint resume training, and has an asynchronous checkpoint that is less than 2 minutes. For software, Moore Threads also boasts full compatibility with NVIDIA's CUDA framework, where its MUSIFY tool translates CUDA code to MUSA GPU architecture at supposedly zero cost of migration, i.e., no performance penalty.

Moore Threads Unveils MTT S60 & MTT S2000 Graphics Cards with DirectX Support

Chinese company Moore Threads has unveiled their MTT GPU series just 18 months after the company's establishment in 2020. The MT Unified System Architecture (MUSA) architecture is the first for any Chinese company to be developed fully domestically and includes support for DirectX, OpenCL, OpenGL, Vulkan, and CUDA. The company announced the MTT S60 and MTT S2000 single slot desktop graphics cards for gaming and server applications at a recent event. The MTT S60 is manufactured on a 12 nm node and features 2,048 MUSA cores paired with 8 GB of LPGDDR4X memory offering 6 TFLOPs of performance. The MTT S2000 is also manufactured on a 12 nm node and doubles the number of MUSA cores to 4096 paired with 32 GB of undisclosed video memory allowing it to reach 12 TFLOPs.

Moore Threads joins Intel in supporting AV1 encoding on a consumer GPU with MUSA cards featuring H.264, H.265, and AV1 encoding support in addition to H.264, H.265, AV1, VP8, and VP9 decoding. The company is also developing a physics engine dubbed Alphacore which is said to work with existing tools such as Unity, Unreal Engine, and Houdini to accelerate physics performance by 5 to 10 times. The only gaming performance shown was a simple demonstration of the MTT S60 running League of Legends at 1080p without any frame rate details.
Return to Keyword Browsing
Dec 20th, 2024 12:54 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts