Mainframe Update: A chip off the new block

Sunday, 8 September 2024

A chip off the new block

IBM may not have announced a new mainframe, but it has told us all about the chips that will be powering those mainframes – and it’s very much aimed at making artificial intelligence (AI) software run faster and better.

Let’s take a look at the details.

Back in 2021, we heard about the Telum I processor with its on-chip AI accelerator for inferencing. Now we hear that the Telum II processor has improved AI acceleration and has an IBM Spyre™ Accelerator. We’ll get to see these chips in 2025.

The new chip has been developed using Samsung 5nm technology and has 43 billion transistors. It will feature eight high-performance cores running at 5.5GHz. The Telum II chip will include a 40% increase in on-chip cache capacity, with the virtual L3 and virtual L4 growing to 360MB and 2.88GB respectively. The processor integrates a new data processing unit (DPU) specialized for IO acceleration and the next generation of on-chip AI acceleration. These hardware enhancements are designed to provide significant performance improvements for clients over previous generations.

Because the integrated DPU has to handle tens of thousands of outstanding I/O requests, instead of putting the it behind the PCIe bus, it is coherently connected and has its own L2 cache. IBM says this increases performance and power efficiency. In fact, there are ten 36MB of L2 caches with eight 5.5GHz cores running fixed frequency. The onboard AI accelerator runs at 24 trillion operations per second (TOPS). IBM claims the new DPU offers increased frequency, memory capacity, and an integrated AI accelerator core. This allows it to handle larger and more complex datasets efficiently. In fact, there are ten 36MB of L2 caches with eight 5.5GHz cores running fixed frequency. The onboard AI accelerator runs at 24 tera-operations per second (TOPS).

You might be wondering why AI on a chip is so important. IBM explains that its AI-driven fraud detection solutions are designed to save clients millions of dollars annually.

The compute power of each accelerator is expected to be improved by a factor of 4, reaching that 24 trillion operations per second we just mentioned. Telum II is engineered to enable model runtimes to sit side by side with the most demanding enterprise workloads, while delivering high throughput, low-latency inferencing. Additionally, support for INT8 as a data type has been added to enhance compute capacity and efficiency for applications where INT8 is preferred, thereby enabling the use of newer models.

New compute primitives have also been incorporated to better support large language models within the accelerator. They are designed to support an increasingly broader range of AI models for a comprehensive analysis of both structured and textual data.

IBM has also made system-level enhancements in the processor drawer. These enhancements enable each AI accelerator to accept work from any core in the same drawer to improve the load balancing across all eight of those AI accelerators. This gives each core access to more low-latency AI acceleration, designed for 192 TOPS available when fully configured between all the AI accelerators in the drawer.

Brand new is the IBM Spyre Accelerator, which was jointly developed with IBM Research and IBM Infrastructure development. It is geared toward handling complex AI models and generative AI use cases. The Spyre Accelerator will contain 32 AI accelerator cores that will share a similar architecture to the AI accelerator integrated into the Telum II chip. Multiple IBM Spyre Accelerators can be connected into the I/O Subsystem of IBM Z via PCIe.

The integration of Telum II and Spyre accelerators eliminates the need to transfer data to external GPU-equipped servers, thereby enhancing the mainframe's reliability and security, and can result in a substantial increase in the amount of available acceleration.

Both the IBM Telum II and the Spyre Accelerator are designed to support a broader, larger set of models with what’s called ensemble AI method use cases. Using ensemble AI leverages the strength of multiple AI models to improve overall performance and accuracy of a prediction as compared to individual models.

IBM suggests insurance claims fraud detection as an example of an ensemble AI method. Traditional neural networks are designed to provide an initial risk assessment, and when combined with large language models (LLMs), they are geared to enhance performance and accuracy. Similarly, these ensemble AI techniques can drive advanced detection for suspicious financial activities, supporting compliance with regulatory requirements and mitigating the risk of financial crimes.

The new Telum II processor and IBM Spyre Accelerator are engineered for a broader set of AI use cases to accelerate and deliver on client business outcomes. We look forward to seeing them in the new IBM mainframes next year.

Sunday, 8 September 2024

A chip off the new block

No comments: