News Posts matching #MTT S4000

Return to Keyword Browsing

Moore Threads MTLink Scales Up to 10,000 Home-Grown GPUs in AI Cluster

Chinese GPU manufacturer Moore Threads has announced a significant upgrade to its KUAE data center server. The company now has the ability to connect up to 10,000 GPUs in a single cluster, marking a huge leap in its scale-out capabilities for artificial intelligence and high-performance computing applications. The enhanced KUAE server incorporates eight MTT S4000 GPUs, leveraging Moore Threads' proprietary MTLink interconnect technology. These GPUs, based on the MUSA architecture, each feature 128 tensor cores and 48 GB of GDDR6 memory, delivering a bandwidth of 768 GB/s. While the full performance metrics of a 10,000-GPU cluster remain undisclosed, the sheer scale of 1,280,000 tensor cores suggests decent computing potential. Moore Threads' GPUs currently lag behind NVIDIA's GPU offerings in terms of performance. However, the company claims its MTT S4000 remains competitive against certain NVIDIA models, particularly in large language model training and inference tasks.

The Chinese company is facing significant challenges due to its inclusion on the U.S. Department of Commerce's Entity List, restricting access to advanced manufacturing processes. Despite these obstacles, the firm has secured partnerships with major Chinese state-run telecom operators and technology companies, focusing on developing new computing cluster projects. A recent financing round raised approximately $343.7 million will help Moore Threads' ambitious expansion plans. However, limited access to cutting-edge semiconductor fabrication technologies may constrain the company's future growth. Nonetheless, creating a scale-out server infrastructure with up to 10,000 GPUs is vital for LLM training and inference, especially as Chinese AI labs catch up to Western labs in terms of the performance of their AI models.

Moore Threads Launches MTT S4000 48 GB GPU for AI Training/Inference and Presents 1000-GPU Cluster

Chinese chipmaker Moore Threads has launched its first domestically-produced 1000-card AI training cluster, dubbed the KUAE Intelligent Computing Center. A central part of the KUAE cluster is Moore Threads new MTT S4000 accelerator card with 48 GB VRAM utilizing the company's third-generation MUSA GPU architecture and 768 GB/s memory bandwidth. In FP32, the card can output 25 TeraFLOPS; in TF32, it can achieve 50 TeraFLOPS; and in FP16/BF16, up to 200 TeraFLOPS. Also supported is INT8 at 200 TOPS. The MTT S4000 focuses on both training and inference, leveraging Moore Thread's high-speed MTLink 1.0 intra-system interconnect to scale cards for distributed model parallel training of datasets with hundreds of billions of parameters. The card also provides graphics, video encoding/decoding, and 8K display capabilities for graphics workloads. Moore Thread's KUAE cluster combines the S4000 GPU hardware with RDMA networking, distributed storage, and integrated cluster management software. The KUAE Platform oversees multi-datacenter resource allocation and monitoring. KUAE ModelStudio hosts training frameworks and model repositories to streamline development.

With integrated solutions now proven at thousands of GPUs, Moore Thread is positioned to power ubiquitous intelligent applications - from scientific computing to the metaverse. The KUAE cluster reportedly achieves near-linear 91% scaling. Taking 200 billion training data as an example, Zhiyuan Research Institute's 70 billion parameter Aquila2 can complete training in 33 days; a model with 130 billion parameters can complete training in 56 days on the KUAE cluster. In addition, the Moore Threads KUAE killocard cluster supports long-term continuous and stable operation, supports breakpoint resume training, and has an asynchronous checkpoint that is less than 2 minutes. For software, Moore Threads also boasts full compatibility with NVIDIA's CUDA framework, where its MUSIFY tool translates CUDA code to MUSA GPU architecture at supposedly zero cost of migration, i.e., no performance penalty.
Return to Keyword Browsing
Dec 20th, 2024 13:20 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts