Sunday 27 March 2022

Mainframes and wicked problems

I know what mainframes are, but what’s the definition of a wicked problem? Well, Wikipedia tells us: “In planning and policy, a wicked problem is a problem that is difficult or impossible to solve because of incomplete, contradictory, and changing requirements that are often difficult to recognize. It refers to an idea or problem that cannot be fixed, where there is no single solution to the problem; and ‘wicked’ denotes resistance to resolution, rather than evil. Another definition is ‘a problem whose social complexity means that it has no determinable stopping point’. Moreover, because of complex interdependencies, the effort to solve one aspect of a wicked problem may reveal or create other problems.”

So, what is the wicked problem that I have in mind? The answer is ransomware. What happens when the IT team suddenly realize that there has been a breach on their network? How do they react to this crisis situation?

According to IBM’s “Cost of a Data Breach Report” 2021, the average cost of a breach increased from $3.86 million to $4.24 million. The report also said that on average there are 212 days between an attack starting and a ransomware demand appearing. And then it takes on average another 75 days to contain the breach.

And if you think that your mainframe is probably safe because hackers usually focus on fintech companies, think again. The annual X-Force Threat Intelligence Index 2022 from IBM Security looks back over 2021. It points out that manufacturing became the most attacked industry in 2021 suffering 23% of all ransomware attacks. In previous years, financial services and insurance companies had always been the biggest targets. The report also found that the exploitation of unpatched software accounted for 44% of ransomware attacks in 2021.

So, why is ransomware a wicked problem? The answer is all down to ambiguity. For the IT team, there are a lot of things they don’t know, such as when the attack started, what data has been affected, and, perhaps more importantly, what system and application software (infrastructure) have been compromised by the hackers.

It’s a bit like the Johari window. There are things you know you know. Things you don’t know you know. Things you know that you don’t know. And thing you don’t know you don’t know. These are the unknown unknowns. And that’s what makes dealing with a hack so hard. It’s even worse if the ransomware messages start arriving on every terminal and printer because management (and everyone else) is shouting at IT for answers. Who is responsible for the breach? When did it occur? What files have been corrupted? Has our infrastructure been compromised? And answers to these (and many other questions) need to be answered very quickly because senior management have called a meeting to decide whether to pay the ransom or not.

In addition to the wicked problem, there’s now the stress of everyone blaming IT and demanding results. How do the brains of IT staff respond? In an ideal world, people would weigh up the most logical answers and come to a conclusion. They would channel their inner Mister Spock. But the ambiguity and pressure mean that they are not in the logical part of their brain (the cerebral cortex), they are in the more primitive part, the limbic system. Here, messages reaching the brain are run past the amygdala, which then decides on the appropriate emotional response – like joy, anger, disgust, sadness, surprise, and fear – and in an ambiguous situation, fear is the likely response. And a fear response causes the hypothalamus to respond by getting the body ready for a fight-or-flight response. In fact, the hypothalamus does two things. Firstly, it uses autonomic nerves to cause the adrenal medulla to start releasing adrenalin and noradrenalin. This is called the SAM pathway (sympathomedullary). Secondly, and slightly slower, it releases hormones that cause the pituitary gland to release hormones that cause the adrenal cortex to produce cortisol – better known as the stress hormone. This is called the HPA axis (HypothalamusPituitaryAdrenal). And, as your body fills with these fight-or-flight hormones, you get amygdala hijack – where the logical part of your brain gradually gets cut off from this primitive limbic part. And, so, logical thought becomes almost impossible. Your body has gone into survival mode. Some people call it cognitive narrowing.

What can be done to prevent the people who are needed to logically solve the problem of a breach going into survival mode and becoming almost incapable of logical thought? What do the fire service do so that they can send men into dangerous situations regularly?

The first thing to do is to practice the situation over-and-over again until it become second nature. It’s often suggested that organizations set up an incident response team that trains together regularly. That way they know what the other team members are going to do in a situation and they can do their part. Habits are stored in part of the brain called the basal ganglia. Accessing them doesn’t involve the logical part of the brain. So, they can be used in the worst situations – like having the CEO standing behind you while you try to resolve the situation.

A useful addition is the use of decision trees, particularly fast and frugal trees. These are like flowcharts, where the consequence of one event leads either to another choice or to a decision. These are often used in high-stress situations (like fire fighting or operating theatres) to make decisions that don’t ignore important information that could otherwise result in a tragedy. The fast and frugal tree provides a simple way to indicate that each task has been completed and nothing important to the task has been forgotten. One of the first tasks for the incident response team would be to create the fast and frugal tree. And then keep it up to date as training proceeds.

Organizations – ransomware is a whole company problem – need to create and train incident response teams to deal with potential breaches in a way that can prevent what would otherwise be a wicked problem causing them to experience amygdala hijack and stopping the ransomware attack being dealt with effectively. And IT teams need to ensure that they have taken all possible preventative measures to stop a breach occurring in the first place.

Sunday 20 March 2022

Trouble getting new mainframe staff?


Many of us can remember a time when the machine room was buzzing. There could be as many as 20 operators on a shift, busily loading tapes and changing boxes of 11-inch multi-line paper in the printer. And outside the machine room were the programmers and others who kept everything going. I am talking about the days of punch cards and even paper tape as a way of inputting a program.

But that was all a long time ago. Somehow, organizations forgot all about their mainframes as times changed and people began to use laptops and mobile phones. The number of operators needed on a shift plummeted to none at all as the idea of a lights-out environment took hold. That’s not to say that mainframes weren’t still being used, just that DASD storage got bigger and fewer (eventually almost none) tapes needed changing. No-one used punched cards to enter a program, it was all done on-screen. And printers tended to reside locally – and eventually very little printing was needed.

CICS, IMS, Db2 and z/OS itself continued to be improved and extended. And the days of a massive piece of hardware with a bank of flashing lights and switches evolved into something that could be rack mounted like any other server. Without the rowdiness of a room full of operators and the sci-fi-ness of a box with flashing lights, people simply thought less and less about the power and the importance of the mainframe. It gradually faded from many people’s consciousness – like a relative that you’ve not seen for a number of years.

However, the mainframe was still there and was doing a very important job. IBM tells us that 67 Fortune 100 enterprises still use mainframes. Included in those figures are 45 of the top 50 banks, eight of the top 10 telcos, and seven of the top 10 retailers. As you know, mainframes didn’t disappear.

So, as we came to the 2019, mainframes were still doing their vitally important job, but most column inches published online and in print were focused on cloud, and mobile, and web. And then in 2020, the world was suddenly working from home as the pandemic hit. And many people used this time to re-evaluate their life. There were a lot of mainframe professionals who were getting near (or even past) their ideal retirement age. And when they looked at their life – their work/life balance – they came to the conclusion that there were better things that they could do rather than go to meetings and try to battle for the importance of the mainframe to an organization. They could, instead, be playing golf, seeing their grandchildren, learning to play the piano again, and so much more. And once the crisis seemed to be over, a significant number of people retired.

But that’s alright, those companies thought, we have loads of people in the IT team, they can take over! The problem is that most of those people had no mainframe experience. And things like CICS and IMS are very complex pieces of software to understand. The concept of the rolling 4-hour average is not a concept that makes sense to people used to working with cloud-based applications. Very quickly, organizations are realizing that they need mainframe experts looking after the mainframe.

One option that companies might think of looking at is using youngsters, who could be trained by the company’s mainframe experts before they retired. The problem with that idea is that many youngsters are not entering the job market. Following the pandemic, they have decided to stay in education for longer than similar cohorts in 2018 or 2019 would have done. And that means there are fewer young people available for work now as well as older staff leaving. And that can be quite a problem for many of these Fortune 100 companies looking for mainframe staff. And, as we all know, a shortage of staff can lead to an increase in the salaries that the remaining staff can demand.

The problem isn’t just with large organizations and the results of the pandemic. For a long time, universities have stopped teaching COBOL in favour of Python, JavaScript, Go, etc. But these lead to youngsters joining an already crowded job market. Somehow, these youngsters need to be shown that their employment prospects and their salary expectations will be much higher if they were to learn mainframe. The problem is that. like so many other people, colleges probably don’t know much about mainframes.

IBM and Broadcom do have training programmes to help train and on-board the next generation of mainframe talent. These programmes help to encourage younger people into the world of the mainframe.

And IBM has been making the mainframe more open for non-mainframers. It’s possible to use Microsoft Visual Studio Code (VSCode) on a mainframe, as well as Java. There’s Zowe, which is an open-source tool that allows non-mainframers to treat mainframes like any other servers. Zowe makes CI/CD tools like Jenkins, Bamboo, and Urban Code available to developers, as well as tools like Ansible and SaltStack. IBM has also produced specific tools to allow non-mainframers to use the mainframe. For example, z/OS Management Facility (z/OSMF) provides system management functionality in a task-oriented, web browser-based UI with integrated user assistance. There’s also Z Open Automation Utilities (ZOAU), which provides a runtime to support the execution of automation tasks on z/OS through Java, Python, and shell commands.

These are all great initiatives, and they will all help organizations to overcome the current difficulty of finding and recruiting mainframe staff. Perhaps, the only real solution is for every mainframer to go out there and blow the trumpet for mainframes: to really let people know what mainframes can do. And, hopefully, non-mainframers will hear the message and boost the number of people available to work on mainframes.

Sunday 13 March 2022

The real priority for mainframers


I recently had a meeting with a fairly senior executive from a company, and our conversation turned to priorities. I asked exactly what the company saw as its priority for 2022-23. Not a surprising question, I didn’t think. The exec took a second or two to marshal their thoughts and explained that for this year they were going to focus on their supply chain and their processing, in order to maximize the profit from the business going forward. That sounded good. I asked them what else they were prioritizing, and they again thought for a moment and concluded that was pretty much it.

I asked them about staff wellbeing because that is something many companies are finding to be an issue. Some staff love being back in the office with the usual office banter, free heating, and supplies of hot drinks. Others love working from home and avoiding the commute, finding somewhere to park, and constant interruption when trying to work. And, seemingly, a lot of staff are changing jobs to get the work/life balance they feel is right for them. Others have found, after two years of Covid uncertainty, that mental health challenges that were kept in check before the outbreak are now causing them concern. I asked what steps they were taking about this and was told that they thought HR had this under control. Mmmh!

I next asked about equality and diversity. What steps were they taking to ensure there was no pay gap between men and women? Was there a glass ceiling holding back the promotion of women? Did their organization reflect the ethnic ratios that existed in the areas where they had offices? Was there a pay gap or glass ceiling affecting those members of the community? The exec thought that they did and that, again, HR had it under control. Mmmh!

So, then I asked what their corporate policy was on ransomware. What did they have in place to prevent phishing attacks? Had they moved to zero trust? Did they have any kind of insurance against hacking? Here the exec felt on safer ground, as they smiled and said that they did most of their computing on a mainframe, so they were safe. And they were migrating the Windows servers to the cloud. So, again they were going to be completely safe. Oh dear!

I mentioned mainframe sites that we know have been hacked: Luxottica, Logica, Swedish Nordea bank, the US Office of Personnel Management (OPM). I mentioned high-profile cyber-attacks from last year, including Colonial Pipeline and Kaseya. I talked briefly why many companies wouldn’t want to reveal that they had been hacked. And I talked about the need to always update software to ensure that the latest known flaws were patched at their site – and prevent bad actors getting in that way.

I briefly talked about some easy ways to hack a mainframe, eg using CICSpwn, which is on GitHub; brute force attacks; JES/FTP attacks; TN3270 emulation attacks, using NJE; NMAP scripts; or ENUM scripts. I asked what training users had about identifying and avoiding phishing attacks? I mentioned key loggers embedded in attachments and interesting sounding counterfeit websites automatically sending malware. I also mentioned that there were plenty of userids and passwords for lots of companies available on the dark web. By now, the exec was busily writing things down and his face was looking a bit grim.

I asked who he thought hackers were, and he replied that they were just kids trying to see what they could do on their computers. I informed him that those days were gone and these days there were organized hacking gangs and there were nation-state hackers. It was all very organized. And, I told him, Ransomware as a Service (RaaS) was now possible to get hold of. So, any disgruntled ex-employee could get a nice payday from their former employer.

Then I asked about how trustworthy all their employees who used the mainframe were. I got a “well yes, probably – um, I don’t really know” response. The truth is that mainframe sites need to assume that the bad guys are already inside their networks. And that’s why moving to zero trust is so important. This ensures that the right person is accessing the right files from the right place. So, if a systems programmer often works at 2 in the morning, then everything is OK – unless today your sysprog is working from Outer Mongolia or somewhere else unusual.

And how much does a ransomware attack cost? According to IBM’s “Cost of a Data Breach Report” 2021, the average cost of a breach increased from $3.86 million to $4.24 million. For US-based companies, the average was $9.05 million per incident. In the healthcare sector, the average cost of a breach was the highest at $9.23 million per incident. For companies that experience a mega breach – that’s between 50 million and 65 million records stolen, the average cost is $401 million! By now, the executive I was talking to was beginning to turn green.

The other thing, I continued, is just how long a company is hacked before the ransomware demand appears. IBM’s report says 212 days. And, worryingly, it takes on average another 75 days to contain it. That means you could have been breached over six months ago.

The hackers get in, and they raise their security level so they can do what they want. They then find all the good stuff – there’s no rush – and they take a copy of it. They corrupt the backups so you can’t recover the data, and then they encrypt the data. Then they give you the ransom demand – usually with a warning not to tell anyone, particularly not security specialists. They get money from your company when you pay the ransom. And they get money from the dark web when they sell your data.

And that’s why mainframe sites need something like File Integrity Monitoring software (eg FIM+ from MainTegrity) to alert sites when unexpected changes are made to applications or data or even backups. The FIM+ product can alert you to who made the change and when. And it can tell you exactly what was changed. More than that, it can tell you exactly which backup (which it has verified) to recover from.

I wasn’t trying to make a sale, that’s not my role. I was trying to alert the executive to exactly how vulnerable his company was to a ransomware attack and steps needed to be taken to ensure that they were still in business in order to carry out the other things on their priority list.

Sunday 6 March 2022

Mainframe as a Service


IBM has recently announced an expansion to its Z-series mainframe on the IBM Cloud. The changes, according to IBM, have reduced the time it takes to access z/OS development and test environments from days to minutes – around six minutes they suggest. Not surprisingly, they suggest that IBM Cloud for z/OS development is 15 times faster than using an x86 environment.

In effect, IBM will be offering virtual machines that people can use as mainframe test and development environments with the intention of creating cloud-based virtual production environments. The set-up is currently available as a closed experiment, but will become generally available in the second half of the year. Users get on-demand access to z/OS and can develop and test applications that they are working on.

IBM is actually delivering IBM Wazi as a Service (Wazi aaS), which is what makes the z/OS capabilities available to IBM Cloud.

According to the IBM on-boarding site; “IBM Wazi Developer is a productive development environment that fully integrates into any enterprise-wide standard DevOps pipeline. IBM Wazi Developer delivers three key components – Wazi Sandbox, Wazi Code, and Wazi Analyze to let you analyze, develop, build, test, and deploy z/OS application code with modern tools.”

Using Wazi as a Service, users can create development and test systems in IBM Cloud Virtual Private Cloud. And, as mentioned earlier, IBM claims this can be done in just six minutes. Users can then manage virtual machine-based compute, storage, and networking resources in a private, secure space that they have defined. Wazi Image Builder lets users create custom images from their on-premises LPAR. Wazi Image Builder includes a web UI with role-based access and REST APIs to streamline the creation process.

According to the press release, Wazi aaS is being designed to help developers:

  • Increase speed and agility with on-demand access to z/OS for development and test
  • Accelerate DevOps practices with predictable and flexible consumption-based pricing
  • Reduce the need for specialised skills with a consistent cloud-native development experience.

IBM is also working with ecosystem partners such as TCS and BMC to help IBM Z clients accelerate the modernization of their applications, data, and processes in an open hybrid cloud architecture.

You might be wondering how much of the cloud marketplace IBM has. Statista has some estimates. They reckon that AWS has 33 percent, Azure 21 percent, and Google Cloud has 10 percent. Alibaba has six percent, and then comes IBM with four percent. So, anything that boosts that figure has got to be good for the company. Just out of interest, Salesforce and Tencent have three percent each, and Oracle has two percent. I wonder how much IBM would have to pay to buy AWS?

Running test and development in the cloud makes a great deal of sense for most organizations who can keep the mainframe in the data centre running just production workloads. It reduces the need to worry about development workloads pushing up the rolling 4-hour average(R4HA) of MSUs (Millions of Service Units) consumed at a site. And it also gives organizations better control over their costs.

I’m sure that there will be a big demand for this service from IBM once it goes live later in the year not only for the reasons mentioned above, but also because it will be possible for anyone (with the right credentials) to go to IBM Cloud and quickly provision Z resources. And then rapidly develop applications, test applications, and bring those applications back into their on-prem environment for very rapid production deployment. It fills a need that many companies are, perhaps, only just realizing that they have.