News Posts matching #Machine Learning

Return to Keyword Browsing

Hewlett Packard Enterprise Brings HPE Cray EX and HPE Cray XD Supercomputers to Enterprise Customers

Hewlett Packard Enterprise (NYSE: HPE) today announced it is making supercomputing accessible for more enterprises to harness insights, solve problems and innovate faster by delivering its world-leading, energy-efficient supercomputers in a smaller form factor and at a lower price point.

The expanded portfolio includes new HPE Cray EX and HPE Cray XD supercomputers, which are based on HPE's exascale innovation that delivers end-to-end, purpose-built technologies in compute, accelerated compute, interconnect, storage, software, and flexible power and cooling options. The supercomputers provide significant performance and AI-at-scale capabilities to tackle demanding, data-intensive workloads, speed up AI and machine learning initiatives, and accelerate innovation to deliver products and services to market faster.

Inventec's Rhyperior Is the Powerhouse GPU Accelerator System Every Business in the AI And ML World Needs

Taiwan-based leading server manufacturing company Inventec's powerhouse GPU accelerator system, Rhyperior, is everything any modern-day business needs in the digital era, especially those relying heavily on Artificial Intelligence (AI) and Machine Learning (ML). A unique and optimal combination of GPUs and CPUs, this 4U GPU accelerator system is based on the NVIDIA A100 Tensor Core GPU and Intel Xeon 3rd Gen (Whitley platform). Rhyperior also equips an NVIDIA NVSwitch to enhance performance dramatically, and its power can be an effective tool for modern workloads.

In a world where technology is disrupting our lives as we know it, GPU acceleration is critical: essentially speeding up processes that would otherwise take much longer. Acceleration boosts execution for complex computational problems that can be broken down into similar, parallel operations. In other words, an excellent accelerator can be a game changer for industries like gaming and healthcare, increasingly relying on the latest technologies like AI and ML for better, more robust solutions for consumers.

CXL Consortium Releases Compute Express Link 3.0 Specification to Expand Fabric Capabilities and Management

The CXL Consortium, an industry standards body dedicated to advancing Compute Express Link (CXL) technology, today announced the release of the CXL 3.0 specification. The CXL 3.0 specification expands on previous technology generations to increase scalability and to optimize system level flows with advanced switching and fabric capabilities, efficient peer-to-peer communications, and fine-grained resource sharing across multiple compute domains.

"Modern datacenters require heterogenous and composable architectures to support compute intensive workloads for applications such as Artificial Intelligence and Machine Learning - and we continue to evolve CXL technology to meet industry requirements," said Siamak Tavallaei, president, CXL Consortium. "Developed by our dedicated technical workgroup members, the CXL 3.0 specification will enable new usage models in composable disaggregated infrastructure."

Cerebras Systems Sets Record for Largest AI Models Ever Trained on A Single Device

Cerebras Systems, the pioneer in high performance artificial intelligence (AI) computing, today announced, for the first time ever, the ability to train models with up to 20 billion parameters on a single CS-2 system - a feat not possible on any other single device. By enabling a single CS-2 to train these models, Cerebras reduces the system engineering time necessary to run large natural language processing (NLP) models from months to minutes. It also eliminates one of the most painful aspects of NLP—namely the partitioning of the model across hundreds or thousands of small graphics processing units (GPU).

"In NLP, bigger models are shown to be more accurate. But traditionally, only a very select few companies had the resources and expertise necessary to do the painstaking work of breaking up these large models and spreading them across hundreds or thousands of graphics processing units," said Andrew Feldman, CEO and Co-Founder of Cerebras Systems. "As a result, only very few companies could train large NLP models - it was too expensive, time-consuming and inaccessible for the rest of the industry. Today we are proud to democratize access to GPT-3 1.3B, GPT-J 6B, GPT-3 13B and GPT-NeoX 20B, enabling the entire AI ecosystem to set up large models in minutes and train them on a single CS-2."

SMART Modular Announces the SMART Kestral PCIe Optane Memory Add-in-Card to Enable Memory Expansion and Acceleration

SMART Modular Technologies, Inc. ("SMART"), a division of SGH and a global leader in memory solutions, solid-state drives, and hybrid storage products, announces its new SMART Kestral PCIe Optane Memory Add-in-Card (AIC), which is able to add up to 2 TB of Optane Memory expansion on a PCIe-Gen4-x16 or PCIe-Gen3-x16 interface independent of the motherboard CPU. SMART's Kestral AICs accelerate selected algorithms by offloading software-defined storage functions from the host CPU to the Intel FPGA on the AIC. SMART's Kestral memory AICs are ideal for hyperscale, data center, and other similar environments that run large memory applications, and would benefit from memory acceleration or system acceleration through computational storage.

"With the advancement of new interconnect standards such as CXL and OpenCAPI, SMART's new family of SMART Kestral AICs addresses the industry's need for a variety of new memory module form factors and interfaces for memory expansion and acceleration," stated Mike Rubino, SMART Modular's vice president of engineering. "SMART is able to leverage our many years of experience in developing and productizing controller-based memory solutions to meet today's emerging and continually evolving memory add-on needs of server and storage system customers."

Supermicro Breakthrough Universal GPU System - Supports All Major CPU, GPU, and Fabric Architectures

Super Micro Computer, Inc. (SMCI), a global leader in enterprise computing, storage, networking solutions, and green computing technology, has announced a revolutionary technology that simplifies large scale GPU deployments and is a future proof design that supports yet to be announced technologies. The Universal GPU server provides the ultimate flexibility in a resource-saving server.

The Universal GPU system architecture combines the latest technologies supporting multiple GPU form factors, CPU choices, storage, and networking options optimized together to deliver uniquely-configured and highly scalable systems. Systems can be optimized for each customer's specific Artificial Intelligence (AI), Machine Learning (ML), and High-Performance Computing (HPC) applications. Organizations worldwide are demanding new options for their next generation of computing environments, which have the thermal headroom for the next generation of CPUs and GPUs.

Google Uses Artificial Intelligence to Develop Faster and Smaller Hardware Accelerators

Designing Artificial Intelligence / Machine Learning hardware accelerators takes effort from hardware engineers in conjunction with scientists working in the AI/ML area itself. A few years ago, we started seeing AI incorporated into parts of electronic design automation (EDA) software tools, helping chip designers speed up the process of creating hardware. What we were "used to" seeing AI do are just a couple of things like placement and routing. And having that automated is a huge deal. However, it looks like the power of AI for chip design is not going to stop there. Researchers at Google and UC Berkeley have made a research project that helps AI design and develop AI-tailored accelerators smaller and faster than anything humans made.

In the published paper, researchers present PRIME - a framework that created AI processors based on a database of blueprints. The PRIME framework feeds off an offline database containing accelerator designs and their corresponding performance metrics (e.g., latency, power, etc.) to design next-generation hardware accelerators. According to Google, PRIME can do so without further hardware simulation and has processors ready for use. As per the paper, PRIME improves performance upon state-of-the-art simulation-driven methods by as much as 1.2x-1.5x. It also reduces the required total simulation time by 93% and 99%, respectively. The framework is also capable of architecting accelerators for unseen applications.

Tanzanite Silicon Solutions Demonstrates Industry's First CXL Based Memory Expansion and Memory Pooling Products

Tanzanite Silicon Solutions Inc., the leader in the development of Compute Express Link (CXL) based products, is unveiling its architectural vision and product roadmap with an SoC mapped to FPGA Proof-Of-Concept vehicle demonstrating Memory Expansion and Memory Pooling, with multi-host CXL based connectivity. Explosive demand for memory and compute to meet the needs of emerging applications such as Artificial Intelligence (AI), Machine Learning (ML), blockchain technology, and the metaverse is outpacing monolithic systems. A disaggregated data center design with composable components for CPU, memory, storage, GPU, and XPU is needed to provide flexible and dynamic pooling of resources to meet the varying demands of heterogenous workloads in an optimal and efficient manner.

Tanzanite's visionary TanzanoidTZ architecture and purpose-built design of a "Smart Logic Interface Connector" (SLICTZ) SoC enables independent scaling and sharing of memory and compute in a pool with low latency within and across server racks. The Tanzanite solution provides a highly scalable architecture for exa-scale level memory capacity and compute acceleration, supporting multiple industry standard form-factors, ranging from E1.S, E3.S, memory expansion board, and memory appliance.

Ceremorphic Exits Stealth Mode; Unveils Technology Plans to Deliver a New Architecture Specifically Designed for Reliable Performance Computing

Armed with more than 100 patents and leveraging multi-decade expertise in creating industry-leading silicon systems, Ceremorphic Inc. today announced its plans to deliver a complete silicon system that provides the performance needed for next-generation applications such as AI model training, HPC, automotive processing, drug discovery, and metaverse processing. Designed in advanced silicon geometry (TSMC 5 nm node), this new architecture was built from the ground up to solve today's high-performance computing problems in reliability, security and energy consumption to serve all performance-demanding market segments.

Ceremorphic was founded in April 2020 by industry-veteran Dr. Venkat Mattela, the Founding CEO of Redpine Signals, which sold its wireless assets to Silicon Labs, Inc. in March 2020 for $308 million. Under his leadership, the team at Redpine Signals delivered breakthrough innovations and industry-first products that led to the development of an ultra-low-power wireless solution that outperformed products from industry giants in the wireless space by as much as 26 times on energy consumption. Ceremorphic leverages its own patented multi-thread processor technology ThreadArch combined with cutting-edge new technology developed by the silicon, algorithm and software engineers currently employed by Ceremorphic. This team is leveraging its deep expertise and patented technology to design an ultra-low-power training supercomputing chip.

AMD Files Patent for Chiplet Machine Learning Accelerator to be Paired With GPU, Cache Chiplets

AMD has filed a patent whereby they describe a MLA (Machine Learning Accelerator) chiplet design that can then be paired with a GPU unit (such as RDNA 3) and a cache unit (likely a GPU-excised version of AMD's Infinity Cache design debuted with RDNA 2) to create what AMD is calling an "APD" (Accelerated Processing Device). The design would thus enable AMD to create a chiplet-based machine learning accelerator whose sole function would be to accelerate machine learning - specifically, matrix multiplication. This would enable capabilities not unlike those available through NVIDIA's Tensor cores.

This could give AMD a modular way to add machine-learning capabilities to several of their designs through the inclusion of such a chiplet, and might be AMD's way of achieving hardware acceleration of a DLSS-like feature. This would avoid the shortcomings associated with implementing it in the GPU package itself - an increase in overall die area, with thus increased cost and reduced yields, while at the same time enabling AMD to deploy it in other products other than GPU packages. The patent describes the possibility of different manufacturing technologies being employed in the chiplet-based design - harkening back to the I/O modules in Ryzen CPUs, manufactured via a 12 nm process, and not the 7 nm one used for the core chiplets. The patent also describes acceleration of cache-requests from the GPU die to the cache chiplet, and on-the-fly usage of it as actual cache, or as directly-addressable memory.

GIGABYTE Joins MLCommons to Accelerate the Machine Learning Community

GIGABYTE Technology, an industry leader in high-performance servers and workstations, today announced GIGABYTE as one of the founding members of MLCommons, an open engineering consortium with the goal of accelerating machine learning with benchmarking, large-scale open data sets, and best practices that are community-driven.

In 2018, a group from Google, Baidu, Harvard, and Stanford created a benchmarking suite for machine learning called MLPerf. The purpose was to evaluate the new generation of accelerators to neural-networking jobs performance. By having benchmarking tools, companies and universities would be able to design hardware and software optimized for training and inferencing machine learning workloads.

CXL Consortium Releases Compute Express Link 2.0 Specification

The CXL Consortium, an industry standards body dedicated to advancing Compute Express Link (CXL) technology, today announced the release of the CXL 2.0 specification. CXL is an open industry-standard interconnect offering coherency and memory semantics using high-bandwidth, low-latency connectivity between host processor and devices such as accelerators, memory buffers, and smart I/O devices. The CXL 2.0 specification adds support for switching for fan-out to connect to more devices; memory pooling for increased memory utilization efficiency and providing memory capacity on demand; and support for persistent memory - all while preserving industry investments by supporting full backwards compatibility with CXL 1.1 and 1.0.

"Datacenter architectures continue to evolve rapidly to support the growing demands of emerging workloads for Artificial Intelligence and Machine Learning, with CXL technology keeping pace to meet the performance and latency demands," said Barry McAuliffe, president, CXL Consortium. "Designed with breakthrough performance and easy adoption as guiding principles, the CXL 2.0 specification is a significant achievement from our dedicated technical work group members."

NVIDIA Surpasses Intel in Market Cap Size

Yesterday after the stock market has closed, NVIDIA has officially reached a bigger market cap compared to Intel. After hours, the price of the NVIDIA (ticker: NVDA) stock is $411.20 with a market cap of 251.31B USD. It marks a historic day for NVIDIA as the company has historically been smaller than Intel (ticker: INTC), with some speculating that Intel could buy NVIDIA in the past while the company was much smaller. Intel's market cap now stands at 248.15B USD, which is a bit lower than NVIDIA's. However, the market cap is not an indication of everything. NVIDIA's stock is fueled by the hype generated around Machine Learning and AI, while Intel is not relying on any possible bubbles.

If we compare the revenues of both companies, Intel is having much better performance. It had a revenue of 71.9 billion USD in 2019, while NVIDIA has 11.72 billion USD of revenue. No doubt that NVIDIA has managed to do a good job and it managed to almost double revenue from 2017, where it went from $6.91 billion in 2017 to $11.72 billion in 2019. That is an amazing feat and market predictions are that it is not stopping to grow. With the recent acquisition of Mellanox, the company now has much bigger opportunities for expansion and growth.

Arm Announces new IP Portfolio with Cortex-A78 CPU

During this unprecedented global health crisis, we have experienced rapid societal changes in how we interact with and rely on technology to connect, aid, and support us. As a result of this we are increasingly living our lives on our smartphones, which have been essential in helping feed our families through application-based grocery or meal delivery services, as well as virtually seeing our colleagues and loved ones daily. Without question, our Arm-based smartphones are the computing hub of our lives.

However, even before this increased reliance on our smartphones, there was already growing interest among users to explore the limits of what is possible. The combination of these factors with the convergence of 5G and AI, are generating greater demand for more performance and efficiency in the palm of our hands.
Arm Cortex-A78

TSMC and Broadcom Enhance the CoWoS Platform with World's First 2X Reticle Size Interposer

TSMC today announced it has collaborated with Broadcom on enhancing the Chip-on-Wafer-on-Substrate (CoWoS ) platform to support the industry's first and largest 2X reticle size interposer. With an area of approximately 1,700mm2, this next generation CoWoS interposer technology significantly boosts computing power for advanced HPC systems by supporting more SoCs as well as being ready to support TSMC's next-generation five-nanometer (N5) process technology.

This new generation CoWoS technology can accommodate multiple logic system-on-chip (SoC) dies, and up to 6 cubes of high-bandwidth memory (HBM), offering as much as 96 GB of memory. It also provides bandwidth of up to 2.7 terabytes per second, 2.7 times faster than TSMC's previously offered CoWoS solution in 2016. With higher memory capacity and bandwidth, this CoWoS solution is well-suited for memory-intensive workloads such as deep learning, as well as workloads for 5G networking, power-efficient datacenters, and more. In addition to offering additional area to increase compute, I/O, and HBM integration, this enhanced CoWoS technology provides greater design flexibility and yield for complex ASIC designs in advanced process nodes.

Intel in Negotiations for Habana Labs Acquisition

Intel is currently performing negotiations to acquire Israeli AI chip startup, Habana Labs, according to a person who spoke to Calcalist anonymously. If the deal realizes, Intel will pay between one and two billion USD, making it Intel's second-largest acquisition of an Israeli company. When asked about the potential deal, the Intel spokesperson has stated that the company will not respond to rumors surrounding it, while Habana Labs has yet to respond to a request for comment made by Calcalist.

Founded in 2016 by Israeli entrepreneur Avigdor Willenz, who founded Galileo Technologies and Annapurna Labs, Habana Labs develops processors for training and inference of Machine Learning models. This acquisition would allow Intel to compete better in the AI processor market and get new customers which were previously exclusive to Habana Labs.

NVIDIA Leads the Edge AI Chipset Market but Competition is Intensifying: ABI Research

Diversity is the name of the game when it comes to the edge Artificial Intelligence (AI) chipset industry. In 2019, the AI industry is witnessing the continual migration of AI workloads, particularly AI inference, to edge devices, including on-premise servers, gateways, and end-devices and sensors. Based on the AI development in 17 vertical markets, ABI Research, a global tech market advisory firm, estimates that the edge AI chipset market will grow from US $2.6 billion in 2019 to US $7.6 billion by 2024, with no vendor commanding more than 40% of the market.

The frontrunner of this market is NVIDIA, with a 39% revenue share in the first half of 2019. The GPU vendor has a strong presence in key AI verticals that are currently leading in AI deployments, such as automotive, camera systems, robotics, and smart manufacturing. "In the face of different use cases, NVIDIA chooses to release GPU chipsets with different computational and power budgets. In combination with its large developer ecosystem and partnerships with academic and research institutions, the chipset vendor has developed a strong foothold in the edge AI industry," said Lian Jye Su, Principal Analyst at ABI Research.

NVIDIA is facing stiff competition from Intel with its comprehensive chipset portfolio, from Xeon CPU to Mobileye and Movidius Myriad. At the same time, FPGA vendors, such as Xilinx, QuickLogic, and Lattice Semiconductor, are creating compelling solutions for industrial AI applications. One missing vertical from NVIDIA's wide footprint is consumer electronics, specifically smartphones. In recent years, AI processing in smartphones has been driven by smartphone chipset manufacturers and smartphone vendors, such as Qualcomm, Huawei, and Apple. In smart home applications, MediaTek and Amlogic are making their presence known through the widespread adoption of voice control front ends and smart appliances.

Compute Express Link Consortium (CXL) Officially Incorporates

Today, Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel Corporation and Microsoft announce the incorporation of the Compute Express Link (CXL) Consortium, and unveiled the names of its newly-elected members to its Board of Directors. The core group of key industry partners announced their intent to incorporate in March 2019, and remain dedicated to advancing the CXL standard, a new high-speed CPU-to-Device and CPU-to-Memory interconnect which accelerates next-generation data center performance.

The five new CXL board members are as follows: Steve Fields, Fellow and Chief Engineer of Power Systems, IBM; Gaurav Singh, Corporate Vice President, Xilinx; Dong Wei, Standards Architect and Fellow at ARM Holdings; Nathan Kalyanasundharam, Senior Fellow at AMD Semiconductor; and Larrie Carr, Fellow, Technical Strategy and Architecture, Data Center Solutions, Microchip Technology Inc.

India First Country to Deploy AI Machine Learning to Fight Income Tax Evasion

India is building a large AI machine learning data-center that can crunch through trillions of financial transactions per hour to process income tax returns of India's billion-strong income tax assessee base. India's Income Tax Department has relied on human tax assessment officers that are randomly selected by a computer to assess tax returned filed by individuals, in an increasingly inefficient system that's prone to both evasion and corruption. India has already been using machine learning since 2017 to fish out cases of tax-evasion for further human scrutiny. The AI now replaces human assessment officers, relegating them up an escalation matrix.

The AI/ML assessment system is a logical next step to two big policy decisions the Indian government has taken in recent years: one of 100% data-localization by foreign entities conducting commerce in India; and getting India's vast population to use electronic payment instruments, away from paper-money, by de-monetizing high-value currency, and replacing it with a scarce supply of newer bank-notes that effectively force people to use electronic instruments. Contributing to these efforts are some of the lowest 4G mobile data prices in the world (as low as $1.50 for 40 GB of 4G LTE data), and low-cost smartphone handsets. It's also free to open a basic bank account with no minimum balance requirements.

Logic Supply Unveils Karbon 300 Compact Rugged PC, Built For IoT

Global industrial and IoT hardware manufacturer Logic Supply has combined the latest vision processing, security protocols, wireless communication technologies, and proven cloud architectures to create the Karbon 300 rugged fanless computer. The system has been engineered to help innovators overcome the limitations of deploying reliable computer hardware in challenging environments.

"Computing at the edge is increasingly at the core of today's Industry 4.0 and Industrial IoT solutions," says Logic Supply VP of Products Murat Erdogan. "These devices are being deployed in environments that would quickly destroy traditional computer hardware. The builders and creators we work with require a careful combination of connectivity, processing and environmental protections. With Karbon 300, we're providing the ideal mix of capabilities to help make the next generation of industry-shaping innovation a reality, and enable innovators to truly challenge what's possible."

QNAP Officially Launches the TS-2888X AI-Ready NAS for Machine Learning

QNAP Systems, Inc. today officially launched the TS-2888X, an AI-Ready NAS specifically optimized for AI model training. Built using powerful Intel Xeon W processors with up to 18 cores and employing a flash-optimized hybrid storage architecture for IOPS-intensive workloads, the TS-2888X also supports installing up to 4 high-end graphics cards and runs QNAP's AI developer package " QuAI". The TS-2888X packs everything required for machine learning AI, greatly reducing latency, accelerating data transfer, and eliminating performance bottlenecks caused by network connectivity to expedite AI implementation.

Google Cloud Introduces NVIDIA Tesla P4 GPUs, for $430 per Month

Today, we are excited to announce a new addition to the Google Cloud Platform (GCP) GPU family that's optimized for graphics-intensive applications and machine learning inference: the NVIDIA Tesla P4 GPU.

We've come a long way since we introduced our first-generation compute accelerator, the K80 GPU, adding along the way P100 and V100 GPUs that are optimized for machine learning and HPC workloads. The new P4 accelerators, now in beta, provide a good balance of price/performance for remote display applications and real-time machine learning inference.

Khronos Group Releases NNEF 1.0 Standard for Neural Network Exchange

The Khronos Group, an open consortium of leading hardware and software companies creating advanced acceleration standards, announces the release of the Neural Network Exchange Format (NNEF) 1.0 Provisional Specification for universal exchange of trained neural networks between training frameworks and inference engines. NNEF reduces machine learning deployment fragmentation by enabling a rich mix of neural network training tools and inference engines to be used by applications across a diverse range of devices and platforms. The release of NNEF 1.0 as a provisional specification enables feedback from the industry to be incorporated before the specification is finalized - comments and feedback are welcome on the NNEF GitHub repository.

AMD also Announces Radeon Instinct MI8 and MI6 Machine Learning Accelerators

AMD also announced the Radeon Instinct MI8 and MI6 Machine Learning GPUs based on Fiji and Polaris cores, respectively. These parts comprise the more "budget" part of the still most certainly non-consumer oriented high-end machine learning lineup. Still, with all parts using fairly modern cores, they aim to make an impact in their respective segments.

Starting with the Radeon Instinct MI8, we have a Fiji based core with the familiar 4 GBs of HBM1 memory and 512 GB/s total memory bandwidth. It has 8.2 TFLOPS of either Single Precision of Half Precision floating point performance (so performance there does not double when going half precision like its bigger Vega based brother, the MI25). It features 64 Compute Units.

The Radeon Instinct MI6 is a Polaris based card and slightly slower in performance than the MI8, despite having four times the amount of memory at 16 GBs of GDDR5. The likely reason for this is a slower bandwidth speed, at only 224 GB/s. It also has less compute units at 36 total, with a total of 2304 stream processors. This all equates out to a still respectable 5.7 TFLOPs of overall half or single precision floating point performance (which again, does not double at half precision rate like Vega).

AMD Announces the Radeon Instinct MI25 Deep Learning Accelerator

AMD's EPYC Launch presentation focused mainly on its line of datacenter processors, but fans of AMD's new Vega GPU lineup may be interested in another high-end product that was announced during the presentation. The Radeon Instinct MI25 is a Deep Learning accelerator, and as such is hardly intended for consumers, but it is Vega based and potentially very potent in the company's portfolio all the same. Claiming a massive 24.6 TFLOPS of Half Precision Floating Point performance (12.3 Single Precision) from its 64 "next-gen" compute units, this machine is very suited to Deep Learning and Machine AI oriented applications. It comes with no less than 16 GBs of HBM2 memory, and has 484 GB/s of memory bandwidth to play with.
Return to Keyword Browsing
Apr 8th, 2025 06:32 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts