IBM may not have announced a new mainframe, but it has told us all about the chips that will be powering those mainframes – and it’s very much aimed at making artificial intelligence (AI) software run faster and better.
Let’s take a look
at the details.
Back in 2021, we
heard about the Telum I processor with its on-chip AI accelerator for inferencing.
Now we hear that the Telum II processor has improved AI acceleration and has an
IBM Spyre™ Accelerator. We’ll get to see these chips in 2025.
The new chip has
been developed using Samsung 5nm technology and has 43 billion transistors. It will
feature eight high-performance cores running at 5.5GHz. The Telum II chip will include
a 40% increase in on-chip cache capacity, with the virtual L3 and virtual L4 growing
to 360MB and 2.88GB respectively. The processor integrates a new data processing
unit (DPU) specialized for IO acceleration and the next generation of on-chip AI
acceleration. These hardware enhancements are designed to provide significant performance
improvements for clients over previous generations.
Because the
integrated DPU has to handle tens of thousands of outstanding I/O requests, instead
of putting the it behind the PCIe bus, it is coherently connected and has its
own L2 cache. IBM says this increases performance and power efficiency. In
fact, there are ten 36MB of L2 caches with eight 5.5GHz cores running fixed
frequency. The onboard AI accelerator runs at 24 trillion operations per second
(TOPS). IBM claims the new DPU offers increased frequency, memory capacity, and
an integrated AI accelerator core. This allows it to handle larger and more
complex datasets efficiently. In fact, there are ten 36MB of L2 caches with
eight 5.5GHz cores running fixed frequency. The onboard AI accelerator runs at
24 tera-operations per second (TOPS).
You might be
wondering why AI on a chip is so important. IBM explains that its AI-driven fraud
detection solutions are designed to save clients millions of dollars annually.
The compute power
of each accelerator is expected to be improved by a factor of 4, reaching that 24
trillion operations per second we just mentioned. Telum II is engineered to enable
model runtimes to sit side by side with the most demanding enterprise workloads,
while delivering high throughput, low-latency inferencing. Additionally, support
for INT8 as a data type has been added to enhance compute capacity and efficiency
for applications where INT8 is preferred, thereby enabling the use of newer models.
New compute primitives
have also been incorporated to better support large language models within the accelerator.
They are designed to support an increasingly broader range of AI models for a comprehensive
analysis of both structured and textual data.
IBM has also
made system-level enhancements in the processor drawer. These enhancements enable
each AI accelerator to accept work from any core in the same drawer to improve the
load balancing across all eight of those AI accelerators. This gives each core access
to more low-latency AI acceleration, designed for 192 TOPS available when fully
configured between all the AI accelerators in the drawer.
Brand new is the
IBM Spyre Accelerator, which was jointly developed with IBM Research and IBM Infrastructure
development. It is geared toward handling complex AI models and generative AI
use cases. The Spyre Accelerator will contain 32 AI accelerator cores that will
share a similar architecture to the AI accelerator integrated into the Telum II
chip. Multiple IBM Spyre Accelerators can be connected into the I/O Subsystem of
IBM Z via PCIe.
The integration
of Telum II and Spyre accelerators eliminates the need to transfer data to
external GPU-equipped servers, thereby enhancing the mainframe's reliability
and security, and can result in a substantial increase in the amount of available
acceleration.
Both the IBM Telum
II and the Spyre Accelerator are designed to support a broader, larger set of models
with what’s called ensemble AI method use cases. Using ensemble AI leverages
the strength of multiple AI models to improve overall performance and accuracy of
a prediction as compared to individual models.
IBM suggests insurance
claims fraud detection as an example of an ensemble AI method. Traditional neural
networks are designed to provide an initial risk assessment, and when combined with
large language models (LLMs), they are geared to enhance performance and accuracy.
Similarly, these ensemble AI techniques can drive advanced detection for suspicious
financial activities, supporting compliance with regulatory requirements and mitigating
the risk of financial crimes.
The new Telum II
processor and IBM Spyre Accelerator are engineered for a broader set of AI use cases
to accelerate and deliver on client business outcomes. We look forward to
seeing them in the new IBM mainframes next year.
No comments:
Post a Comment