Mainframe Update: November 2024

Sunday, 24 November 2024

Tell me about ONNX and mainframe AI

Let’s start by finding out what ONNX is. It stands for Open Neural Network eXchange, and it’s described as an open-source AI (artificial intelligence) ecosystem with the aim of establishing open standards for representing machine learning algorithms and software tools to promote innovation and collaboration. You can get it from GitHub.

To put that another way, it means you can create and train AI models on any platform that you like, using any framework (eg PyTorch, TensorFlow, Caffe2, Scikit-learn, etc) you like, and ‘translate’ that into a standard format that can then be run on any other platform – and the one that we’re interested in is the mainframe.

ONNX was originally called Toffee and was developed by a team from Facebook, but was renamed in 2017. It’s supported by IBM, Microsoft, Huawei, Intel, AMD, Arm, Qualcomm, and others.

Developers may want to use different frameworks for a project because particular frameworks may be better suited to specific phases of the development process, such as fast training, network architecture flexibility, or inferencing on mobile devices. ONNX then facilitates the seamless exchange and sharing of models across many different deep learning frameworks. Another advantage of using ONNX is that it allows hardware vendors and others to improve the performance of artificial neural networks of multiple frameworks at once by targeting the ONNX representation.

ONNX provides definitions of an extensible computation graph model, built-in operators and standard data types, focused on inferencing (evaluation). Each computation dataflow graph is a list of nodes that form an acyclic graph. Nodes have inputs and outputs. Each node is a call to an operator. Metadata documents the graph. Built-in operators are to be available on each ONNX-supporting framework. Thanks to Wikipedia for the information in this format.

So, we saw in that list of vendors that IBM is involved in the project. How is ONNX used on a mainframe? I know part of the answer to that because I watched a fascinating presentation by Megan E Hampton, IBM – Advisory Software Engineer, at the excellent GSE UK conference at the start of the month. Here’s what she told her audience.

Currently, on the mainframe, there aren’t many tools available for the optimization of AI models. That’s where ONNX comes in. It is an open format for representing AI models. ONNX defines a computation graph model, as well as definitions of built-in operators and standard data types.

ONNX uses a standard format for representing machine learning (ML) and deep learning (DL) models. ONNX models are generated by supported DL and ML frameworks or converted from other formats by converting tools. ONNX models can be imported into multiple frameworks and runtime engines and executed/accelerated by heterogeneous hardware and execution environments.

Among the benefits of using ONNX on a mainframe are that it:

Allows clients to use popular tools and frameworks to build and train.
Makes assets portable to multiple Z operating systems.
Optimizes and enables seamless use of IBM Z hardware and software acceleration investments.

But what’s the next stage? How do you get from an AI model to something useful that can run on a mainframe? That’s where the IBM Z Deep Learning Compiler (zDLC) come in. It uses open source ONNX-MLIR to compile .onnx deep learning AI models into shared libraries. The resulting shared libraries can then be integrated into C, C++, Java, or Python applications.

zDLC takes the ONNX (model) as input, and generates a single binary. It handles static and dynamic shapes as well as multiple data representations. And it exploits parallelism via OpenMP. OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behaviour.

Multi-level intermediate representation (MLIR) significantly reduces the cost of building domain specific compilers. It connects existing compilers together through a shared infrastructure. It’s part of LLVM compiler and follows LLVM governance. LLVM and MLIR are new and powerful ways of writing compilers that are modular and generic. MLIR is flexible, and introduced the concept of ‘dialects’.

Think of it like this:

ONNX (the AI model) plus MLIR (the compiler) produces ONNX-MLIR | IBM Z Deep Learning Compiler (ie it compiles the AI models).

So, just to explain these further, MLIR is a unifying software framework for compiler development. It is a sub-project of the LLVM Compiler Infrastructure project.

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. Interestingly, LLVM isn't an acronym, although, originally, it stood for Low Level Virtual Machine.

Let’s go back to the mainframe again, we can build and train a model in any popular framework (PyTorch, TensorFlow, etc) on any platform, which allows the maximum flexibility possible. Then on the mainframe, we can then use ONNX. Models are converted to the ONNX interchange format. We can then leverage z/OS Container Extensions (zCX) if we want to run the application inside a Docker container on z/OS as part of a z/OS workload. We can also run the applications on zIIP engines, which won’t impact the 4-hour rolling average cost of general processors. The IBM zDLC (Deep Learning Compiler) enables existing models to quickly and easily take advantage of the IBM z16 Telum processor's Integrated Accelerator for AI.

Looking at the Deep Learning Compiler Flow: the ONNX model (dialect) is lowered and transformed through multiple phases of intermediate representation (IR) to a dialect that can be processed by an LLVM compiler. The output of the LLVM compilation and build is a shared library object that can be deployed.

It all seems so simple when it’s explained. I expect we’re going to hear a lot more about all this.

Sunday, 10 November 2024

More on security

Following on from last week’s blog entitled Insider threats and smf, I recently got a press release from application security SaaS company Indusface giving some figures to the problem that organizations are facing from their own employees. It’s not just that there are a very small minority of employees who seem intent on bringing their company down by deleting data or launching ransomware attacks, there also seems to be a huge pool of people who inadvertently give away information, or open malware, or click on ‘dodgy’ links that leave companies wide open to serious attacks by bad actors.

The people at Indusface have used global search data from AHrefs to find the world's top five questions and concerns about cyber security in the workplace. The data from AHrefs, which was correct as of October 2024, can be found here. They have then come up with their own suggested answers to those searches.

I’d like to start with the question that came in fourth place, which was “What percentage of breaches are human error responsible for?” There were similar searches on “Human error cyber security”

Their answer was: “According to data by Indusface, 98% of all cyber-attacks rely on human error or a form of social engineering. Special engineering breaches leverage human error, emotions, and mistakes rather than exploiting technical vulnerabilities. Hackers often use psychological manipulation, which may involve coaxing employees to reveal sensitive information, download malicious software, or unknowingly clicking on harmful links. Unlike traditional cyberattacks that rely on brute force, social engineering requires direct interaction between attacker and victim.

“Given that human error can be a major weak link in cyber security, the best way to prevent these attacks is to put in place education and training on the types of attacks to expect and how to avoid these. That said, implementing a zero-trust architecture, where requests for every resource are vetted against an access policy, will be paramount in stopping attacks from spreading even when a human error results in a breach. Also, make sure that the applications are pen tested for business logic and privilege escalation vulnerabilities so that the damage is minimized.

“Basics such as standard best practices across the board, secure communications, knowing which emails to open, when to raise red flags, and exercising extreme caution when accepting offers will go a long way in preventing human errors that lead to breaches.”

Let’s look at the other search terms in the top five. In first place, with the most searches, was. “Why is cyber security training so important for business?” There were similar searches for “Cyber security for business”.

The answer from Indusface was: “With data breaches costing businesses an average of $4.45 million globally in the last year (according to IBM’s Cost of a Data Breach Report 2024), it raises the question of just how critical it is for organizations to provide employees with comprehensive training on what constitutes sensitive data and how they can protect it, as well as what is at stake if they do not adhere to the policies.

“And training doesn’t have to be monotonous, for example set up phishing email simulators to engage the team and allow them to see the potential dangers in action. These simulations show how quickly and easily attacks can happen, helping employees develop practical, hands-on skills for spotting suspicious activity.

“Cybersecurity threats evolve constantly, so training should be regular, not a one-time event. Regular training and guidance will ensure that employees receive tailored guidance on securing their work equipment, home offices, use of VPNs, and recognizing the unique threats posed by both in-office and home working environments.”

The second most frequent searches were “How is AI used in cyber security?” or simply “Cyber Security AI”.

Indusface said: “The biggest problem with security software, especially website and API protection is the prevalence of false positives. False positives are when legitimate users are prevented from accessing an application. So notorious is this problem that 50%+ of businesses worldwide have implemented Web Application and API Protection/ Web Application Firewall (WAAP/WAF) solutions and left them on log mode. This means that attacks go through the WAF and they are at best used as log analysis tools after a breach.

“Effectively using AI can help with eliminating or reducing false positives to a bare minimum and encourage more businesses to deploy WAFs in block mode.

“The other problem with security software is letting an attack go through. These are also called false negatives. Using AI on past user behaviour and attack logs can effectively prevent any attacks that don’t conform to typical user behaviour.”

Third in their list was “How can you protect your home computer?” and “Home cyber security”. They suggest that by 2025, according to a Forbes’ article, approximately 22% of workers will work remotely. They go on to ask, with such a significant increase in remote roles, how can employers ensure their employees' home computer remains protected?

Their answer was: “Remote working means people are working in less secure environments and their devices are more exposed to data breaches both digitally and physically. Many remote workers are using the same device for professional and personal use, or even accessing company data on devices shared with other household members.

“Employers should ensure strong password management, including using automatic password generators that create extra secure passwords, and never duplicate these across accounts. Multi-factor authentication also provides a secure method of verifying your identity, making it harder for hackers to breach any accounts. Limiting what could be accessed on official devices is also important in thwarting attacks.

“That said, installing endpoint security software like antivirus, and keeping it updated, should be enough to protect most computers, unless you fall victim to an advanced phishing attack.”

The fifth most popular searches were, “What are the top 3 targeted industries for cyber-attacks?” and “Top industries cyber-attack”.

Here’s what Indusface said: “According to EC University, manufacturing, professional / business, and healthcare are the top 3 targeted industries.

“The manufacturing sector leads the world in cybercrime incidents according to Statista (2023). Attacks on the industry range from halting production lines, to the theft of intellectual property, and compromising the integrity of supply chains.

“The professional, business, and consumer services sector has also become an attractive target for cybercriminals due to its heavy reliance on sensitive data. Confidential client information and business insights are often targeted, leading to significant financial losses and damage to brand reputation, and client relationships.

“A breach in the healthcare industry can have dire consequences, from compromising sensitive patient data to disrupting critical medical services. Given the high value of medical records on the black market, there is an urgent need for stronger cybersecurity measures to protect both patient privacy and the integrity of healthcare systems.”

I thought it was useful to get another view on the ongoing issue of keeping your mainframe – and any other platforms your organization supports – safe from breaches. And keeping your employees alert at all times to potential threats.

Sunday, 3 November 2024

Insider threats and SMF

Many people think that SMF records will tell you everything that has happened at a site. And, if you link it to some kind of alerting software, it will act as the cornerstone of your mainframe’s security. And that, as they sleep snuggly in their beds at night, is their mainframe security done and dusted.

Many people think that all the people who work for their organization and access their mainframes are intelligent and trustworthy, and are not really worth worrying about when their main focus should be on gangs trying to extort money or hostile nation states trying destroy their country’s competitors, or just damage the infrastructure of any country they view as hostile to them. That’s where an organization’s main security focus should be, surely?

Let’s start by deciding what an insider threat actually is. Let’s start with people who are employed by an organization. They have a valid userid and password and have a legitimate right to be accessing the mainframe. Now, every so often, humans will make mistakes. Some are small – and some can be quite major. It may be the case that your trusted insider accidentally deletes files or makes some other changes to the mainframe. Provided that person owns up straightaway, the IT team can usually solve the problem fairly promptly. Files can be restored from backups before other batch jobs that use those files are scheduled to run. And chaos can be averted.

Other insiders may be more malicious. They may have not got the internal promotion they were expecting or the pay rise that they needed. Other members of staff may have problems outside of the office, for example an increasing drug habit or an increasing use of alcohol. They may be running up gambling debts as they try to win back the money they have lost. Both groups are a problem. The disgruntled insiders may well deliberately cause damage to data or applications. They may have the authority to make other changes. And the second group of addicted users may well be manipulated by organized crime to infect the mainframe with some kind of malware that the bad actors associated with those criminals can use to launch a ransomware attack.

These days, the disgruntled employs can access Ransomware as a Service (RaaS) applications and launch an attack on the mainframe – hoping that the money they get from the ransom will compensate them for the money the company didn’t give them. It will also have to be enough to support their lifestyle once they go on the run.

Criminal gangs are also on the look out for credentials that can get them into the mainframe. Disgruntled staff or employees who need money to fund their habits will be approached and offered money for their userids and passwords. Using these, the bad actors can do what they want on the mainframe, safe in the knowledge that most tools processing SMF records won’t identify unusual activity by those accounts.

There’s another group of employees that might be targeted by criminal gangs, and those are people who need money. It may be that an ageing relative needs to go into a home and they need money to pay for that relative’s care. It may be that a family member needs an operation that needs to be paid for. Or a family member may need an expensive medication that they will have to pay for. These people may be vulnerable to exploitation by criminal gangs.

Of course, ordinary members of staff may be tricked by the use of an AI simulating the voice of their manager, who asks to ‘borrow’ the employee’s userid and password to do some work over the weekend.

Typically, security tools won’t send alerts if valid userids and passwords are used. And if the settings are changed so that an alert is sent, you get the situation where staff get so many false positives that they tend to ignore the messages.

Let’s see what the Cost of a Data Breach Report 2024 from IBM had to say about insider threats. The report says that the global average cost of a data breach in 2024 is US$4.88m, and the USA has the highest average data breach cost at US$9.36m. Compared to other vectors, malicious insider attacks resulted in the highest costs, averaging US$4.99 million. It goes on to say that among other expensive attack vectors were business email compromise, phishing, social engineering, and stolen or compromised credentials.

Using compromised credentials benefited attackers in 16% of breaches. Compromised credential attacks can also be costly for organizations, accounting for an average US$4.81 million per breach. Phishing came in a close second, at 15% of attack vectors, but in the end cost more, at US$4.88 million. Malicious insider attacks were only 7% of all breach pathways.

The report also found that the average time to identify and contain a breach fell to 258 days, however, whether credentials were stolen or used by malicious insiders, attack identification and containment time increased to an average combined time of 292 and 287 days respectively.

So, while insider threats aren’t the biggest threat to your mainframe, they are still a significant threat in the amount of money they can cost your organization as well as the amount of time it will take to recover from the attack. SMF is great, but security tools don’t usually send alerts when there is unusual activity by the accounts used by employees. So, these activities aren’t identified straight away and won’t be halted. Obviously, file integrity monitoring software would solve that problem before it became a serious problem. It would be able to identify an unusual activity and immediately suspend the job or user, and then send an alert. If it were a real systems programmer working at 2 in the morning from, say, Outer Mongolia, then, once this is confirmed, the job can be allowed to continue. But if you don’t have that type of software installed, guess what’s going to be filling your time for the next 258 days!

What I’m suggesting is that insider threats are a real issue, and SMF on its own isn’t enough.