News Posts matching #LLM

Return to Keyword Browsing

Broadcom Partners with Google Cloud to Strengthen Gen AI-Powered Cybersecurity

Symantec, a division of Broadcom Inc., is partnering with Google Cloud to embed generative AI (gen AI) into the Symantec Security platform in a phased rollout that will give customers a significant technical edge for detecting, understanding, and remediating sophisticated cyber attacks.

Symantec is leveraging the Google Cloud Security AI Workbench and security-specific large language model (LLM)--Sec-PaLM 2-across its portfolio to enable natural language interfaces and generate more comprehensive and easy-to-understand threat analyses. With Security AI Workbench-powered summarization of complex incidents and alignment to MITRE ATT&CK context, security operations center (SOC) analysts of all levels can better understand threats and be able to respond faster. That, in turn, translates into greater security and higher SOC productivity.

Useful Sensors Launches AI-In-A-Box Module, a Low Cost Offline Solution

Useful Sensors, an AI-focused start-up, today launched the world's first low-cost, off-the-shelf AI module to enable intuitive, natural language interaction with electronic devices, locally and privately, with no need for an account or internet connection. The new AI-In-A-Box module can answer queries and solve problems in a way similar to well-known AI tools based on a large language model (LLM). But thanks to compression and acceleration technologies developed by Useful Sensors, the module hosts its LLM file locally, enabling its low-cost microprocessor to understand and respond instantly to spoken natural language queries or commands without reference to a data center.

Disconnected from the internet, the AI-In-A-Box module definitively eliminates user concerns about privacy, snooping, or dependence on third-party cloud services that are prevalent with conventional LLM-based AI products and services marketed by large technology companies. The AI-In-A-Box module is available to buy now at CrowdSupply, priced at $299.

UPMEM Raises €7M to Revolutionize AI and Analytics Processing

UPMEM, a fabless semiconductor startup has raised €4.1 M equity from the European Innovation Council (EIC) Fund and Venture Capitalists (Partech, Western Digital Capital, C4 Ventures…), and a €2.5M grant from the EIC. Founded by Fabrice Devaux and Gilles Hamou, the company is pioneering ultra-efficient Processing In Memory (PIM) accelerators to tackle the significant challenge of compute efficiency for AI and big data applications.

UPMEM's PIM solution, integrating UPMEM's first commercial-grade PIM chip on the market, is now available to cloud markets across the globe (US, Asia...) to provide the most cost-effective and energy-efficient solutions for AI and analytics applications in data centers and at the edge, such as large language models (LLM e.g. GPT), genomics, large analytics.

d-Matrix Announces $110 Million in Funding for Corsair Inference Compute Platform

d-Matrix, the leader in high-efficiency generative AI compute for data centers, has closed $110 million in a Series-B funding round led by Singapore-based global investment firm Temasek. The goal of the fundraise is to enable d-Matrix to begin commercializing Corsair, the world's first Digital-In Memory Compute (DIMC), chiplet-based inference compute platform, after the successful launches of its prior Nighthawk, Jayhawk-I and Jayhawk II chiplets.

d-Matrix's recent silicon announcement, Jayhawk II, is the latest example of how the company is working to fundamentally change the physics of memory-bound compute workloads common in generative AI and large language model (LLM) applications. With the explosion of this revolutionary technology over the past nine months, there has never been a greater need to overcome the memory bottleneck and current technology approaches that limit performance and drive up AI compute costs.

NVIDIA Paves the Way for Natural Speech Conversations with Game NPCs

Imagine you're in a vast RPG fill with hundreds, if not thousands, of interactive NPCs (non-playable characters). All current RPGs conduct your interactions with them over a bunch of pre-defined statement selections, where you choose among a bunch of text-based options on the screen, which elicits a certain response from the NPC. This feels very unnatural and railroaded, but NVIDIA plans to change this. With ACE (character engine) and NeMo SteerLM (a natural language model), NVIDIA wants to make voice based interactions with NPCs possible. This is a very necessary stepping stone toward the near-future, where NPCs will be backed by large GPTs letting you have lengthy conversations with them.

The way this works is, the player gives an NPC a natural language voice input. A speech-to-text engine and LLM process the voice input, and generate a natural language response. Omniverse Audio2Face is leveraged to create the NPC's response in real time. Announcing this Gamescom, NVIDIA's new NeMo SteerLLM adds life to the part of ACE that processes the natural voice input, and based on the kind of personality traits the game developer gives an NPC, generates responses with varying degree of creativity, humor, and toxicity among other attributes.

OpenAI Degrades GPT-4 Performance While GPT-3.5 Gets Better

When OpenAI announced its GPT-4 model, it first became a part of ChatGPT, behind the paywall for premium users. The GPT-4 is the latest installment in the Generative Pretrained Transformer (GPT) Large Language Models (LLMs). The GPT-4 aims to be a more capable version than the GPT-3.5 that powered ChatGPT at first, which was capable once it launched. However, it seems like the performance of GPT-4 has been steadily dropping since its introduction. Many users noted the regression, and today we have researchers from Stanford University and UC Berkeley, who benchmarked the GPT-4 performance in March 2023, and the model's performance in June 2023 in tasks like solving math problems, visual reasoning, code generation, and answering sensitive questions.

The results? The paper shows that GPT-4 performance has been significantly degraded in all the tasks. This could be attributed to improving stability, lowering the massive compute demand, and much more. What is unexpected, GPT-3.5 experienced a significant uplift in the same period. Below, you can see the examples that were benchmarked by the researchers, which also compare GTP-4 and GPT-3.5 performance in all cases.

NVIDIA Espouses Generative AI for Improved Productivity Across Industries

A watershed moment on Nov. 22, 2022, was mostly virtual, yet it shook the foundations of nearly every industry on the planet. On that day, OpenAI released ChatGPT, the most advanced artificial intelligence chatbot ever developed. This set off demand for generative AI applications that help businesses become more efficient, from providing consumers with answers to their questions to accelerating the work of researchers as they seek scientific breakthroughs, and much, much more.

Businesses that previously dabbled in AI are now rushing to adopt and deploy the latest applications. Generative AI—the ability of algorithms to create new text, images, sounds, animations, 3D models and even computer code—is moving at warp speed, transforming the way people work and play. By employing large language models (LLMs) to handle queries, the technology can dramatically reduce the time people devote to manual tasks like searching for and compiling information.

AMD CEO Lisa Su Notes: AI to Dominate Chip Design

Artificial intelligence (AI) has emerged as a transformative force in chip design, with recent examples from China and the United States showcasing its potential. Jensen Huang, CEO of Nvidia, believes that AI can empower individuals to become programmers, while Lisa Su, CEO of AMD, predicts an era where AI dominates chip design. During the 2023 World Artificial Intelligence Conference (WAIC) in Shanghai, Su emphasized the importance of interdisciplinary collaboration for the next generation of chip designers. To excel in this field, engineers must possess a holistic understanding of hardware, software, and algorithms, enabling them to create superior chip designs that meet system usage, customer deployment, and application requirements.

The integration of AI into chip design processes has gained momentum, fueled by the AI revolution catalyzed by large language models (LLMs). Both Huang and Mark Papermaster, CTO of AMD, acknowledge the benefits of AI in accelerating computation and facilitating chip design. AMD has already started leveraging AI in semiconductor design, testing, and verification, with plans to expand its use of generative AI in chip design applications. Companies are now actively exploring the fusion of AI technology with Electronic Design Automation (EDA) tools to streamline complex tasks and minimize manual intervention in chip design. Despite limited data and accuracy challenges, the "EDA+AI" approach holds great promise. For instance, Synopsys has invested significantly in AI tool research and recently launched Synopsys.ai, the industry's first end-to-end AI-driven EDA solution. This comprehensive solution empowers developers to harness AI at every stage of chip development, from system architecture and design to manufacturing, marking a significant leap forward in AI's integration into chip design workflows.

Oracle Fusion Cloud HCM Enhanced with Generative AI, Projected to Boost HR Productivity

Oracle today announced the addition of generative AI-powered capabilities within Oracle Fusion Cloud Human Capital Management (HCM). Supported by the Oracle Cloud Infrastructure (OCI) generative AI service, the new capabilities are embedded in existing HR processes to drive faster business value, improve productivity, enhance the candidate and employee experience, and streamline HR processes.

"Generative AI is boosting productivity and unlocking a new world of skills, ideas, and creativity that can have an immediate impact in the workplace," said Chris Leone, executive vice president, applications development, Oracle Cloud HCM. "With the ability to summarize, author, and recommend content, generative AI helps to reduce friction as employees complete important HR functions. For example, with the new embedded generative AI capabilities in Oracle Cloud HCM, our customers will be able to take advantage of large language models to drastically reduce the time required to complete tasks, improve the employee experience, enhance the accuracy of workforce insights, and ultimately increase business value."

NVIDIA Cambridge-1 AI Supercomputer Hooked up to DGX Cloud Platform

Scientific researchers need massive computational resources that can support exploration wherever it happens. Whether they're conducting groundbreaking pharmaceutical research, exploring alternative energy sources or discovering new ways to prevent financial fraud, accessible state-of-the-art AI computing resources are key to driving innovation. This new model of computing can solve the challenges of generative AI and power the next wave of innovation. Cambridge-1, a supercomputer NVIDIA launched in the U.K. during the pandemic, has powered discoveries from some of the country's top healthcare researchers. The system is now becoming part of NVIDIA DGX Cloud to accelerate the pace of scientific innovation and discovery - across almost every industry.

As a cloud-based resource, it will broaden access to AI supercomputing for researchers in climate science, autonomous machines, worker safety and other areas, delivered with the simplicity and speed of the cloud, ideally located for the U.K. and European access. DGX Cloud is a multinode AI training service that makes it possible for any enterprise to access leading-edge supercomputing resources from a browser. The original Cambridge-1 infrastructure included 80 NVIDIA DGX systems; now it will join with DGX Cloud, to allow customers access to world-class infrastructure.

ASUS Demonstrates Liquid Cooling and AI Solutions at ISC High Performance 2023

ASUS today announced a showcase of the latest HPC solutions to empower innovation and push the boundaries of supercomputing, at ISC High Performance 2023 in Hamburg, Germany on May 21-25, 2023. The ASUS exhibition, at booth H813, will reveal the latest supercomputing advances, including liquid-cooling and AI solutions, as well as outlining a slew of sustainability breakthroughs - plus a whole lot more besides.

Comprehensive Liquid-Cooling Solutions
ASUS is working with Submer, the industry-leading liquid-cooling provider to demonstrate immersion-cooling solutions at ISC High Performance 2023, focused on ASUS RS720-E11-IM - the Intel -based 2U4N server that leverages our trusted legacy server architecture and popular features to create a compact new design. This fresh outlook improves the accessibility on I/O ports, storage and cable routing, and strengthens the structure to allow the server to be placed vertically in the tank, with durability assured.

NVIDIA A800 China-Tailored GPU Performance within 70% of A100

The recent growth in demand for training Large Language Models (LLMs) like Generative Pre-trained Transformer (GPT) has sparked the interest of many companies to invest in GPU solutions that are used to train these models. However, countries like China have struggled with US sanctions, and NVIDIA has to create custom models that meet US export regulations. Carrying two GPUs, H800 and A800, they represent cut-down versions of the original H100 and A100, respectively. We reported about H800; however, it remained as mysterious as A800 that we are talking about today. Thanks to MyDrivers, we have information that the A800 GPU performance is within 70% of the regular A100.

The regular A100 GPU manages 9.7 TeraFLOPs of FP64, 19.5 TeraFLOPS of FP64 Tensor, and up to 624 BF16/FP16 TeraFLOPS with sparsity. A rough napkin math would suggest that 70% performance of the original (a 30% cut) would equal 6.8 TeraFLOPs of FP64 precision, 13.7 TeraFLOPs of FP64 Tensor, and 437 BF16/FP16 TeraFLOPs with sparsity. MyDrivers notes that A800 can be had for 100,000 Yuan, translating to about 14,462 USD at the time of writing. This is not the most capable GPU that Chinese companies can acquire, as H800 exists. However, we don't have any information about its performance for now.

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

NVIDIA's H100 has recently become available to use via Cloud Service Providers (CSPs), and it was only a matter of time before someone decided to benchmark its performance and compare it to the previous generation's A100 GPU. Today, thanks to the benchmarks of MosaicML, a startup company led by the ex-CEO of Nervana and GM of Artificial Intelligence (AI) at Intel, Naveen Rao, we have some comparison between these two GPUs with a fascinating insight about the cost factor. Firstly, MosaicML has taken Generative Pre-trained Transformer (GPT) models of various sizes and trained them using bfloat16 and FP8 Floating Point precision formats. All training occurred on CoreWeave cloud GPU instances.

Regarding performance, the NVIDIA H100 GPU achieved anywhere from 2.2x to 3.3x speedup. However, an interesting finding emerges when comparing the cost of running these GPUs in the cloud. CoreWeave prices the H100 SXM GPUs at $4.76/hr/GPU, while the A100 80 GB SXM gets $2.21/hr/GPU pricing. While the H100 is 2.2x more expensive, the performance makes it up, resulting in less time to train a model and a lower price for the training process. This inherently makes H100 more attractive for researchers and companies wanting to train Large Language Models (LLMs) and makes choosing the newer GPU more viable, despite the increased cost. Below, you can see tables of comparison between two GPUs in training time, speedup, and cost of training.

NVIDIA Wants to Set Guardrails for Large Language Models Such as ChatGPT

ChatGPT has surged in popularity over a few months, and usage of this software has been regarded as one of the fastest-growing apps ever. Based on a Large Language Model (LLM) called GPT-3.5/4, ChatGPT uses user input to form answers based on its extensive database used in the training process. Having billions of parameters, the GPT models used for GPT can give precise answers; however, sometimes, these models hallucinate. Given a question about a non-existing topic/subject, ChatGPT can induce hallucination and make up the information. To prevent these hallucinations, NVIDIA, the maker of GPUs used for training and inferencing LLMs, has released a software library to put AI in place, called NeMo Guardrails.

As the NVIDIA repository states: "NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or "rails" for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more." These guardrails are easily programmable and can stop LLMs from outputting unwanted content. For a company that invests heavily in the hardware and software landscape, this launch is a logical decision to keep the lead in setting the infrastructure for future LLM-based applications.
Return to Keyword Browsing
Dec 21st, 2024 07:24 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts