Sunday 8 September 2024

A chip off the new block

IBM may not have announced a new mainframe, but it has told us all about the chips that will be powering those mainframes – and it’s very much aimed at making artificial intelligence (AI) software run faster and better.

Let’s take a look at the details.

Back in 2021, we heard about the Telum I processor with its on-chip AI accelerator for inferencing. Now we hear that the Telum II processor has improved AI acceleration and has an IBM Spyre™ Accelerator. We’ll get to see these chips in 2025.

The new chip has been developed using Samsung 5nm technology and has 43 billion transistors. It will feature eight high-performance cores running at 5.5GHz. The Telum II chip will include a 40% increase in on-chip cache capacity, with the virtual L3 and virtual L4 growing to 360MB and 2.88GB respectively. The processor integrates a new data processing unit (DPU) specialized for IO acceleration and the next generation of on-chip AI acceleration. These hardware enhancements are designed to provide significant performance improvements for clients over previous generations.

Because the integrated DPU has to handle tens of thousands of outstanding I/O requests, instead of putting the it behind the PCIe bus, it is coherently connected and has its own L2 cache. IBM says this increases performance and power efficiency. In fact, there are ten 36MB of L2 caches with eight 5.5GHz cores running fixed frequency. The onboard AI accelerator runs at 24 trillion operations per second (TOPS). IBM claims the new DPU offers increased frequency, memory capacity, and an integrated AI accelerator core. This allows it to handle larger and more complex datasets efficiently. In fact, there are ten 36MB of L2 caches with eight 5.5GHz cores running fixed frequency. The onboard AI accelerator runs at 24 tera-operations per second (TOPS).

You might be wondering why AI on a chip is so important. IBM explains that its AI-driven fraud detection solutions are designed to save clients millions of dollars annually.

The compute power of each accelerator is expected to be improved by a factor of 4, reaching that 24 trillion operations per second we just mentioned. Telum II is engineered to enable model runtimes to sit side by side with the most demanding enterprise workloads, while delivering high throughput, low-latency inferencing. Additionally, support for INT8 as a data type has been added to enhance compute capacity and efficiency for applications where INT8 is preferred, thereby enabling the use of newer models.

New compute primitives have also been incorporated to better support large language models within the accelerator. They are designed to support an increasingly broader range of AI models for a comprehensive analysis of both structured and textual data.

IBM has also made system-level enhancements in the processor drawer. These enhancements enable each AI accelerator to accept work from any core in the same drawer to improve the load balancing across all eight of those AI accelerators. This gives each core access to more low-latency AI acceleration, designed for 192 TOPS available when fully configured between all the AI accelerators in the drawer.

Brand new is the IBM Spyre Accelerator, which was jointly developed with IBM Research and IBM Infrastructure development. It is geared toward handling complex AI models and generative AI use cases. The Spyre Accelerator will contain 32 AI accelerator cores that will share a similar architecture to the AI accelerator integrated into the Telum II chip. Multiple IBM Spyre Accelerators can be connected into the I/O Subsystem of IBM Z via PCIe.

The integration of Telum II and Spyre accelerators eliminates the need to transfer data to external GPU-equipped servers, thereby enhancing the mainframe's reliability and security, and can result in a substantial increase in the amount of available acceleration.

Both the IBM Telum II and the Spyre Accelerator are designed to support a broader, larger set of models with what’s called ensemble AI method use cases. Using ensemble AI leverages the strength of multiple AI models to improve overall performance and accuracy of a prediction as compared to individual models.

IBM suggests insurance claims fraud detection as an example of an ensemble AI method. Traditional neural networks are designed to provide an initial risk assessment, and when combined with large language models (LLMs), they are geared to enhance performance and accuracy. Similarly, these ensemble AI techniques can drive advanced detection for suspicious financial activities, supporting compliance with regulatory requirements and mitigating the risk of financial crimes.

The new Telum II processor and IBM Spyre Accelerator are engineered for a broader set of AI use cases to accelerate and deliver on client business outcomes. We look forward to seeing them in the new IBM mainframes next year.

 

Sunday 1 September 2024

Cybersecurity Assistance

There are two areas that I am particularly interested in. They are artificial intelligence (AI) and mainframe security. And IBM has just announced a generative AI Cybersecurity Assistant.

Worryingly, we know that ransomware malware is now available for people to use to attack mainframe sites – that’s for people who may not have a lot of mainframe expertise. It’s totally de-skilled launching a ransomware attack on an organization. We also know from IBM’s Cost of a Data Breach Report 2024 that organizations using AI and automation lowered their average breach costs compared to those not using AI and automation by an average of US$1.8m. In addition, organizations extensively using security AI and automation identified and contained data breaches nearly 100 days faster on average than organizations that didn’t use these technologies at all.

The survey also found that among organizations that stated they used AI and automation extensively, about 27% used AI extensively in each of these categories: prevention, detection, investigation, and response. Roughly 40% used AI technologies at least somewhat.

So that makes IBM’s new product good news for most mainframe sites. Let’s take a more detailed look.

Built on IBM’s watsonx platform, this new GenAI Cybersecurity Assistant for threat detection and response services, enhances alert investigation for IBM Consulting analysts, accelerating threat identification and response. The new capabilities reduce investigation times by 48%, offering historical correlation analysis and an advanced conversational engine to streamline operations.

That means IBM’s managed Threat Detection and Response (TDR) Services utilized by IBM Consulting analysts now has the Cybersecurity Assistant module to accelerate and improve the identification, investigation, and response to critical security threats. The product “can reduce manual investigations and operational tasks for security analysts, empowering them to respond more proactively and precisely to critical threats, and helping to improve overall security posture for client”, according to Mark Hughes, Global Managing Partner of Cybersecurity Services, IBM Consulting.

IBM’s Threat Detection and Response Services is said to be able to automatically escalate or close up to 85% of alerts; and now, by bringing together existing AI and automation capabilities with the new generative AI technologies, IBM’s global security analysts can speed the investigation of the remaining alerts requiring action. As mentioned earlier, the best figure they are quoting for reducing alert investigation times using this new capability is 48% for one client.

Cybersecurity Assistant cross-correlates alerts and enhances insights from SIEM, network, Endpoint Detection and Response (EDR), vulnerability, and telemetry to provide a holistic and integrative threat management approach.

By analysing patterns of historical, client-specific threat activity, security analysts can better comprehend critical threats. Analysts will have access to a timeline view of attack sequences, helping them to better understand the issue and provide more context to investigations. The assistant can automatically recommend actions based on the historical patterns of analysed activity and pre-set confidence levels, which can reduce response times for clients and so reduce the amount of time that attackers are inside an organization’s network. By continuously learning from investigations, the Cybersecurity Assistant’s speed and accuracy is expected to improve over time.

The generative AI conversational engine in the Cybersecurity Assistant provides real-time insights and support on operational tasks to both clients and IBM security analysts. It can respond to requests, such as opening or summarizing tickets, as well as automatically triggering relevant actions, such as running queries, pulling logs, command explanations, or enriching threat intelligence. By explaining complex security events and commands, IBM’s Threat Detection and Response Service can help reduce noise and boost overall security operations centre (SOC) efficiency for clients.

Anything that can accelerate cyber threat investigations and remediation has got to be good, which this product does using historical correlation analysis (discussed above). Its other significant feature is its ability to streamline operational tasks, which it does using its conversational engine (also discussed above).

There really is an arms race between the bad actors and the rest of us. Anything that gives our side an advantage, no matter how briefly that might be for, has got to be good. Plus, it provides a stepping stone to the next advantage that some bright spark will give us. No-one wants their data all over the dark web, and few companies can afford the cost of fines for non-compliance as well as court costs and payments to people whose data is stolen.